Class: Bio::Sequence::NA
- Inherits:
- String show all
- Includes:
- Common
- Defined in:
- lib/bio/sequence/na.rb,
lib/bio/sequence/compat.rb,
lib/bio/shell/plugin/midi.rb
Overview
-- TODO
- add "Ohno" style
- add a accessor to drum pattern
- add a new feature to select music style (pop, trans, ryukyu, ...)
- what is the base?
++
Direct Known Subclasses
Defined Under Namespace
Classes: MidiTrack
Class Method Summary (collapse)
-
+ (Object) randomize(*arg, &block)
Generate a new random sequence with the given frequency of bases.
Instance Method Summary (collapse)
-
- (Object) at_content
Calculate the ratio of AT / ATGC bases.
-
- (Object) at_skew
Calculate the ratio of (A - T) / (A + T) bases.
-
- (Object) codon_usage
Returns counts of each codon in the sequence in a hash.
-
- (Object) cut_with_enzyme(*args)
(also: #cut_with_enzymes)
Example:.
-
- (Object) dna
Returns a new sequence object with any 'u' bases changed to 't'.
-
- (Object) dna!
Changes any 'u' bases in the original sequence to 't'.
-
- (Object) forward_complement
Returns a new complementary sequence object (without reversing).
-
- (Object) forward_complement!
Converts the current sequence into its complement (without reversing).
-
- (Object) gc_content
Calculate the ratio of GC / ATGC bases.
-
- (Object) gc_percent
Calculate the ratio of GC / ATGC bases as a percentage rounded to the nearest whole number.
-
- (Object) gc_skew
Calculate the ratio of (G - C) / (G + C) bases.
-
- (Object) illegal_bases
Returns an alphabetically sorted array of any non-standard bases (other than 'atgcu').
-
- (NA) initialize(str)
constructor
Generate an nucleic acid sequence object from a string.
-
- (Object) molecular_weight
Estimate molecular weight (using the values from BioPerl's SeqStats.pm module).
-
- (Object) names
Generate the list of the names of each nucleotide along with the sequence (full name).
-
- (Object) pikachu
:nodoc:.
-
- (Object) reverse_complement
(also: #complement)
Returns a new sequence object with the reverse complement sequence to the original.
-
- (Object) reverse_complement!
(also: #complement!)
Converts the original sequence into its reverse complement.
-
- (Object) rna
Returns a new sequence object with any 't' bases changed to 'u'.
-
- (Object) rna!
Changes any 't' bases in the original sequence to 'u'.
-
- (Object) splicing(position)
Alias of Bio::Sequence::Common splice method, documented there.
-
- (Object) to_midi(style = {}, drum = true)
style:.
-
- (Object) to_re
Create a ruby regular expression instance (Regexp) .
-
- (Object) translate(frame = 1, table = 1, unknown = 'X')
Translate into an amino acid sequence.
Methods included from Common
#+, #<<, #composition, #concat, #normalize!, #randomize, #seq, #splice, #subseq, #to_fasta, #to_s, #total, #window_search
Methods inherited from String
#fill, #fold, #skip, #step, #to_aaseq, #to_naseq
Constructor Details
- (NA) initialize(str)
Generate an nucleic acid sequence object from a string.
s = Bio::Sequence::NA.new("aagcttggaccgttgaagt")
or maybe (if you have an nucleic acid sequence in a file)
s = Bio::Sequence:NA.new(File.open('dna.txt').read)
Nucleic Acid sequences are always all lowercase in bioruby
s = Bio::Sequence::NA.new("AAGcTtGG")
puts s #=> "aagcttgg"
Whitespace is stripped from the sequence
seq = Bio::Sequence::NA.new("atg\nggg\ttt\r gc")
puts s #=> "atggggttgc"
Arguments:
-
(required) str: String
Returns |
Bio::Sequence::NA object |
77 78 79 80 81 |
# File 'lib/bio/sequence/na.rb', line 77 def initialize(str) super self.downcase! self.tr!(" \t\n\r",'') end |
Class Method Details
+ (Object) randomize(*arg, &block)
Generate a new random sequence with the given frequency of bases. The sequence length is determined by their cumulative sum. (See also Bio::Sequence::Common#randomize which creates a new randomized sequence object using the base composition of an existing sequence instance).
counts = {'a'=>1,'c'=>2,'g'=>3,'t'=>4}
puts Bio::Sequence::NA.randomize(counts) #=> "ggcttgttac" (for example)
You may also feed the output of randomize into a block
actual_counts = {'a'=>0, 'c'=>0, 'g'=>0, 't'=>0}
Bio::Sequence::NA.randomize(counts) {|x| actual_counts[x] += 1}
actual_counts #=> {"a"=>1, "c"=>2, "g"=>3, "t"=>4}
Arguments:
-
(optional) hash: Hash object
Returns |
Bio::Sequence::NA object |
87 88 89 |
# File 'lib/bio/sequence/compat.rb', line 87 def self.randomize(*arg, &block) self.new('').randomize(*arg, &block) end |
Instance Method Details
- (Object) at_content
Calculate the ratio of AT / ATGC bases. U is regarded as T.
s = Bio::Sequence::NA.new('atggcgtga')
puts s.at_content #=> 0.444444444444444
Returns |
Float |
319 320 321 322 323 324 325 |
# File 'lib/bio/sequence/na.rb', line 319 def at_content count = self.composition at = count['a'] + count['t'] + count['u'] gc = count['g'] + count['c'] return 0.0 if at + gc == 0 return at.quo(at + gc) end |
- (Object) at_skew
Calculate the ratio of (A - T) / (A + T) bases. U is regarded as T.
s = Bio::Sequence::NA.new('atgttgttgttc')
puts s.at_skew #=> -0.75
Returns |
Float |
347 348 349 350 351 352 353 |
# File 'lib/bio/sequence/na.rb', line 347 def at_skew count = self.composition a = count['a'] t = count['t'] + count['u'] return 0.0 if a + t == 0 return (a - t).quo(a + t) end |
- (Object) codon_usage
Returns counts of each codon in the sequence in a hash.
s = Bio::Sequence::NA.new('atggcgtga')
puts s.codon_usage #=> {"gcg"=>1, "tga"=>1, "atg"=>1}
This method does not validate codons! Any three letter group is a 'codon'. So,
s = Bio::Sequence::NA.new('atggNNtga')
puts s.codon_usage #=> {"tga"=>1, "gnn"=>1, "atg"=>1}
seq = Bio::Sequence::NA.new('atgg--tga')
puts s.codon_usage #=> {"tga"=>1, "g--"=>1, "atg"=>1}
Also, there is no option to work in any frame other than the first.
Returns |
Hash object |
275 276 277 278 279 280 281 |
# File 'lib/bio/sequence/na.rb', line 275 def codon_usage hash = Hash.new(0) self.window_search(3, 3) do |codon| hash[codon] += 1 end return hash end |
- (Object) cut_with_enzyme(*args) Also known as: cut_with_enzymes
Example:
seq = Bio::Sequence::NA.new('gaattc')
cuts = seq.cut_with_enzyme('EcoRI')
or
seq = Bio::Sequence::NA.new('gaattc')
cuts = seq.cut_with_enzyme('g^aattc')
See Bio::RestrictionEnzyme::Analysis.cut
481 482 483 |
# File 'lib/bio/sequence/na.rb', line 481 def cut_with_enzyme(*args) Bio::RestrictionEnzyme::Analysis.cut(self, *args) end |
- (Object) dna
Returns a new sequence object with any 'u' bases changed to 't'. The original sequence is not modified.
s = Bio::Sequence::NA.new('augc')
puts s.dna #=> 'atgc'
puts s #=> 'augc'
Returns |
new Bio::Sequence::NA object |
425 426 427 |
# File 'lib/bio/sequence/na.rb', line 425 def dna self.tr('u', 't') end |
- (Object) dna!
Changes any 'u' bases in the original sequence to 't'. The original sequence is modified.
s = Bio::Sequence::NA.new('augc')
puts s.dna! #=> 'atgc'
puts s #=> 'atgc'
Returns |
current Bio::Sequence::NA object (modified) |
437 438 439 |
# File 'lib/bio/sequence/na.rb', line 437 def dna! self.tr!('u', 't') end |
- (Object) forward_complement
Returns a new complementary sequence object (without reversing). The original sequence object is not modified.
s = Bio::Sequence::NA.new('atgc')
puts s.forward_complement #=> 'tacg'
puts s #=> 'atgc'
Returns |
new Bio::Sequence::NA object |
102 103 104 105 106 |
# File 'lib/bio/sequence/na.rb', line 102 def forward_complement s = self.class.new(self) s.forward_complement! s end |
- (Object) forward_complement!
Converts the current sequence into its complement (without reversing). The original sequence object is modified.
seq = Bio::Sequence::NA.new('atgc')
puts s.forward_complement! #=> 'tacg'
puts s #=> 'tacg'
Returns |
current Bio::Sequence::NA object (modified) |
116 117 118 119 120 121 122 123 |
# File 'lib/bio/sequence/na.rb', line 116 def forward_complement! if self.rna? self.tr!('augcrymkdhvbswn', 'uacgyrkmhdbvswn') else self.tr!('atgcrymkdhvbswn', 'tacgyrkmhdbvswn') end self end |
- (Object) gc_content
Calculate the ratio of GC / ATGC bases. U is regarded as T.
s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_content #=> 0.555555555555556
Returns |
Float |
305 306 307 308 309 310 311 |
# File 'lib/bio/sequence/na.rb', line 305 def gc_content count = self.composition at = count['a'] + count['t'] + count['u'] gc = count['g'] + count['c'] return 0.0 if at + gc == 0 return gc.quo(at + gc) end |
- (Object) gc_percent
Calculate the ratio of GC / ATGC bases as a percentage rounded to the nearest whole number. U is regarded as T.
s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_percent #=> 55
Returns |
Fixnum |
290 291 292 293 294 295 296 297 |
# File 'lib/bio/sequence/na.rb', line 290 def gc_percent count = self.composition at = count['a'] + count['t'] + count['u'] gc = count['g'] + count['c'] return 0 if at + gc == 0 gc = 100 * gc / (at + gc) return gc end |
- (Object) gc_skew
Calculate the ratio of (G - C) / (G + C) bases.
s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_skew #=> 0.6
Returns |
Float |
333 334 335 336 337 338 339 |
# File 'lib/bio/sequence/na.rb', line 333 def gc_skew count = self.composition g = count['g'] c = count['c'] return 0.0 if g + c == 0 return (g - c).quo(g + c) end |
- (Object) illegal_bases
Returns an alphabetically sorted array of any non-standard bases (other than 'atgcu').
s = Bio::Sequence::NA.new('atgStgQccR')
puts s.illegal_bases #=> ["q", "r", "s"]
Returns |
Array object |
362 363 364 |
# File 'lib/bio/sequence/na.rb', line 362 def illegal_bases self.scan(/[^atgcu]/).sort.uniq end |
- (Object) molecular_weight
Estimate molecular weight (using the values from BioPerl's SeqStats.pm module).
s = Bio::Sequence::NA.new('atggcgtga')
puts s.molecular_weight #=> 2841.00708
RNA and DNA do not have the same molecular weights,
s = Bio::Sequence::NA.new('auggcguga')
puts s.molecular_weight #=> 2956.94708
Returns |
Float object |
378 379 380 381 382 383 384 |
# File 'lib/bio/sequence/na.rb', line 378 def molecular_weight if self.rna? Bio::NucleicAcid.weight(self, true) else Bio::NucleicAcid.weight(self) end end |
- (Object) names
Generate the list of the names of each nucleotide along with the sequence (full name). Names used in bioruby are found in the Bio::AminoAcid::NAMES hash.
s = Bio::Sequence::NA.new('atg')
puts s.names #=> ["Adenine", "Thymine", "Guanine"]
Returns |
Array object |
409 410 411 412 413 414 415 |
# File 'lib/bio/sequence/na.rb', line 409 def names array = [] self.each_byte do |x| array.push(Bio::NucleicAcid.names[x.chr.upcase]) end return array end |
- (Object) pikachu
:nodoc:
91 92 93 |
# File 'lib/bio/sequence/compat.rb', line 91 def pikachu #:nodoc: self.dna.tr("atgc", "pika") # joke, of course :-) end |
- (Object) reverse_complement Also known as: complement
Returns a new sequence object with the reverse complement sequence to the original. The original sequence is not modified.
s = Bio::Sequence::NA.new('atgc')
puts s.reverse_complement #=> 'gcat'
puts s #=> 'atgc'
Returns |
new Bio::Sequence::NA object |
133 134 135 136 137 |
# File 'lib/bio/sequence/na.rb', line 133 def reverse_complement s = self.class.new(self) s.reverse_complement! s end |
- (Object) reverse_complement! Also known as: complement!
Converts the original sequence into its reverse complement. The original sequence is modified.
s = Bio::Sequence::NA.new('atgc')
puts s.reverse_complement #=> 'gcat'
puts s #=> 'gcat'
Returns |
current Bio::Sequence::NA object (modified) |
147 148 149 150 |
# File 'lib/bio/sequence/na.rb', line 147 def reverse_complement! self.reverse! self.forward_complement! end |
- (Object) rna
Returns a new sequence object with any 't' bases changed to 'u'. The original sequence is not modified.
s = Bio::Sequence::NA.new('atgc')
puts s.dna #=> 'augc'
puts s #=> 'atgc'
Returns |
new Bio::Sequence::NA object |
449 450 451 |
# File 'lib/bio/sequence/na.rb', line 449 def rna self.tr('t', 'u') end |
- (Object) rna!
Changes any 't' bases in the original sequence to 'u'. The original sequence is modified.
s = Bio::Sequence::NA.new('atgc')
puts s.dna! #=> 'augc'
puts s #=> 'augc'
Returns |
current Bio::Sequence::NA object (modified) |
461 462 463 |
# File 'lib/bio/sequence/na.rb', line 461 def rna! self.tr!('t', 'u') end |
- (Object) splicing(position)
Alias of Bio::Sequence::Common splice method, documented there.
84 85 86 87 88 89 90 91 92 |
# File 'lib/bio/sequence/na.rb', line 84 def splicing(position) #:nodoc: mRNA = super if mRNA.rna? mRNA.tr!('t', 'u') else mRNA.tr!('u', 't') end mRNA end |
- (Object) to_midi(style = {}, drum = true)
style:
Hash of :tempo, :scale, :tones
scale:
C C# D D# E F F# G G# A A# B
0 1 2 3 4 5 6 7 8 9 10 11
tones:
Hash of :prog, :base, :range -- tone, vol? or len?, octaves
drum:
true (with rhythm part), false (without rhythm part)
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 |
# File 'lib/bio/shell/plugin/midi.rb', line 351 def to_midi(style = {}, drum = true) default = MidiTrack::Styles["Ichinose"] if style.is_a?(String) style = MidiTrack::Styles[style] || default end tempo = style[:tempo] || default[:tempo] scale = style[:scale] || default[:scale] tones = style[:tones] || default[:tones] track = [] tones.each_with_index do |tone, i| ch = i ch += 1 if i >= 9 # skip rythm track track.push MidiTrack.new(ch, tone[:prog], tone[:base], tone[:range], scale) end if drum rhythm = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] track.push(MidiTrack.new(9, 0, 35, 2, rhythm)) end cur = 0 window_search(4) do |s| track[cur % track.length].push(s) cur += 1 end track.each do |t| t.push_silent(12) end ans = track[0].header(track.length, tempo) track.each do |t| ans += t.encode end return ans end |
- (Object) to_re
Create a ruby regular expression instance (Regexp)
s = Bio::Sequence::NA.new('atggcgtga')
puts s.to_re #=> /atggcgtga/
Returns |
Regexp object |
393 394 395 396 397 398 399 |
# File 'lib/bio/sequence/na.rb', line 393 def to_re if self.rna? Bio::NucleicAcid.to_re(self.dna, true) else Bio::NucleicAcid.to_re(self) end end |
- (Object) translate(frame = 1, table = 1, unknown = 'X')
Translate into an amino acid sequence.
s = Bio::Sequence::NA.new('atggcgtga')
puts s.translate #=> "MA*"
By default, translate starts in reading frame position 1, but you can start in either 2 or 3 as well,
puts s.translate(2) #=> "WR"
puts s.translate(3) #=> "GV"
You may also translate the reverse complement in one step by using frame values of -1, -2, and -3 (or 4, 5, and 6)
puts s.translate(-1) #=> "SRH"
puts s.translate(4) #=> "SRH"
puts s.reverse_complement.translate(1) #=> "SRH"
The default codon table in the translate function is the Standard Eukaryotic codon table. The translate function takes either a number or a Bio::CodonTable object for its table argument. The available tables are (NCBI):
1. "Standard (Eukaryote)"
2. "Vertebrate Mitochondrial"
3. "Yeast Mitochondorial"
4. "Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma"
5. "Invertebrate Mitochondrial"
6. "Ciliate Macronuclear and Dasycladacean"
9. "Echinoderm Mitochondrial"
10. "Euplotid Nuclear"
11. "Bacteria"
12. "Alternative Yeast Nuclear"
13. "Ascidian Mitochondrial"
14. "Flatworm Mitochondrial"
15. "Blepharisma Macronuclear"
16. "Chlorophycean Mitochondrial"
21. "Trematode Mitochondrial"
22. "Scenedesmus obliquus mitochondrial"
23. "Thraustochytrium Mitochondrial"
If you are using anything other than the default table, you must specify frame in the translate method call,
puts s.translate #=> "MA*" (using defaults)
puts s.translate(1,1) #=> "MA*" (same as above, but explicit)
puts s.translate(1,2) #=> "MAW" (different codon table)
and using a Bio::CodonTable instance in the translate method call,
mt_table = Bio::CodonTable[2]
puts s.translate(1, mt_table) #=> "MAW"
By default, any invalid or unknown codons (as could happen if the sequence contains ambiguities) will be represented by 'X' in the translated sequence. You may change this to any character of your choice.
s = Bio::Sequence::NA.new('atgcNNtga')
puts s.translate #=> "MX*"
puts s.translate(1,1,'9') #=> "M9*"
The translate method considers gaps to be unknown characters and treats them as such (i.e. does not collapse sequences prior to translation), so
s = Bio::Sequence::NA.new('atgc--tga')
puts s.translate #=> "MX*"
Arguments:
-
(optional) frame: one of 1,2,3,4,5,6,-1,-2,-3 (default 1)
-
(optional) table: Fixnum in range 1,23 or Bio::CodonTable object (default 1)
-
(optional) unknown: Character (default 'X')
Returns |
Bio::Sequence::AA object |
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 |
# File 'lib/bio/sequence/na.rb', line 234 def translate(frame = 1, table = 1, unknown = 'X') if table.is_a?(Bio::CodonTable) ct = table else ct = Bio::CodonTable[table] end naseq = self.dna case frame when 1, 2, 3 from = frame - 1 when 4, 5, 6 from = frame - 4 naseq.complement! when -1, -2, -3 from = -1 - frame naseq.complement! else from = 0 end nalen = naseq.length - from nalen -= nalen % 3 aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or unknown} return Bio::Sequence::AA.new(aaseq) end |