Class: ActiveSupport::Multibyte::Chars
- Inherits:
-
Object
- Object
- ActiveSupport::Multibyte::Chars
- Includes:
- Comparable
- Defined in:
- activesupport/lib/active_support/multibyte/chars.rb
Overview
Chars enables you to work transparently with UTF-8 encoding in the Ruby String class without having extensive knowledge about the encoding. A Chars object accepts a string upon initialization and proxies String methods in an encoding safe manner. All the normal String methods are also implemented on the proxy.
String methods are proxied through the Chars object, and can be accessed through the mb_chars method. Methods which would normally return a String object now return a Chars object so methods can be chained.
"The Perfect String ".mb_chars.downcase.strip.normalize # => "the perfect string"
Chars objects are perfectly interchangeable with String objects as long as no explicit class checks are made. If certain methods do explicitly check the class, call to_s before you pass chars objects to them.
bad.explicit_checking_method "T".mb_chars.downcase.to_s
The default Chars implementation assumes that the encoding of the string is UTF-8, if you want to handle different encodings you can write your own multibyte string handler and configure it through ActiveSupport::Multibyte.proxy_class.
class CharsForUTF32
def size
@wrapped_string.size / 4
end
def self.accepts?(string)
string.length % 4 == 0
end
end
ActiveSupport::Multibyte.proxy_class = CharsForUTF32
Instance Attribute Summary (collapse)
-
- (Object) wrapped_string
(also: #to_s, #to_str)
readonly
Returns the value of attribute wrapped_string.
Class Method Summary (collapse)
-
+ (Boolean) consumes?(string)
Returns true when the proxy class can handle the string.
-
+ (Boolean) wants?(string)
Returns true if the Chars class can and should act as a proxy for the string string.
Instance Method Summary (collapse)
-
- (Object) +(other)
Returns a new Chars object containing the other object concatenated to the string.
-
- (Object) <=>(other)
Returns -1, 0, or 1, depending on whether the Chars object is to be sorted before, equal or after the object on the right side of the operation.
-
- (Object) =~(other)
Like String#=~ only it returns the character offset (in codepoints) instead of the byte offset.
-
- (Object) []=(*args)
Like String#[]=, except instead of byte offsets you specify character offsets.
-
- (Boolean) acts_like_string?
Enable more predictable duck-typing on String-like classes.
-
- (Object) capitalize
Converts the first character to uppercase and the remainder to lowercase.
-
- (Object) center(integer, padstr = ' ')
Works just like String#center, only integer specifies characters instead of bytes.
-
- (Object) compose
Performs composition on all the characters.
-
- (Object) decompose
Performs canonical decomposition on all the characters.
-
- (Object) downcase
Convert characters in the string to lowercase.
-
- (Object) g_length
Returns the number of grapheme clusters in the string.
-
- (Boolean) include?(other)
Returns true if contained string contains other.
-
- (Object) index(needle, offset = 0)
Returns the position needle in the string, counting in codepoints.
-
- (Chars) initialize(string)
constructor
:nodoc:.
-
- (Object) insert(offset, fragment)
Inserts the passed string at specified codepoint offsets.
-
- (Object) limit(limit)
Limit the byte size of the string to a number of bytes without breaking characters.
-
- (Object) ljust(integer, padstr = ' ')
Works just like String#ljust, only integer specifies characters instead of bytes.
-
- (Object) lstrip
Strips entire range of Unicode whitespace from the left of the string.
-
- (Object) method_missing(method, *args, &block)
Forward all undefined methods to the wrapped string.
-
- (Object) normalize(form = nil)
Returns the KC normalization of the string by default.
-
- (Object) ord
Returns the codepoint of the first character in the string.
-
- (Boolean) respond_to?(method, include_private = false)
Returns true if obj responds to the given method.
-
- (Object) reverse
Reverses all characters in the string.
-
- (Object) rindex(needle, offset = nil)
Returns the position needle in the string, counting in codepoints, searching backward from offset or the end of the string.
-
- (Object) rjust(integer, padstr = ' ')
Works just like String#rjust, only integer specifies characters instead of bytes.
-
- (Object) rstrip
Strips entire range of Unicode whitespace from the right of the string.
-
- (Object) size
(also: #length)
Returns the number of codepoints in the string.
-
- (Object) slice(*args)
(also: #[])
Implements Unicode-aware slice with codepoints.
-
- (Object) split(*args)
Works just like String#split, with the exception that the items in the resulting list are Chars instances instead of String.
-
- (Object) strip
Strips entire range of Unicode whitespace from the right and left of the string.
-
- (Object) tidy_bytes(force = false)
Replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.
-
- (Object) titleize
(also: #titlecase)
Capitalizes the first letter of every word, when possible.
-
- (Object) upcase
Convert characters in the string to uppercase.
Constructor Details
- (Chars) initialize(string)
:nodoc:
43 44 45 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 43 def initialize(string) #:nodoc: @wrapped_string = string end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
- (Object) method_missing(method, *args, &block)
Forward all undefined methods to the wrapped string.
54 55 56 57 58 59 60 61 62 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 54 def method_missing(method, *args, &block) if method.to_s =~ /!$/ @wrapped_string.__send__(method, *args, &block) self else result = @wrapped_string.__send__(method, *args, &block) result.kind_of?(String) ? chars(result) : result end end |
Instance Attribute Details
- (Object) wrapped_string (readonly) Also known as: to_s, to_str
Returns the value of attribute wrapped_string
37 38 39 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 37 def wrapped_string @wrapped_string end |
Class Method Details
+ (Boolean) consumes?(string)
Returns true when the proxy class can handle the string. Returns false otherwise.
76 77 78 79 80 81 82 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 76 def self.consumes?(string) # Unpack is a little bit faster than regular expressions. string.unpack('U*') true rescue ArgumentError false end |
+ (Boolean) wants?(string)
Returns true if the Chars class can and should act as a proxy for the string string. Returns false otherwise.
100 101 102 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 100 def self.wants?(string) $KCODE == 'UTF8' && consumes?(string) end |
Instance Method Details
- (Object) +(other)
Returns a new Chars object containing the other object concatenated to the string.
Example:
('Café'.mb_chars + ' périferôl').to_s # => "Café périferôl"
108 109 110 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 108 def +(other) chars(@wrapped_string + other) end |
- (Object) <=>(other)
Returns -1, 0, or 1, depending on whether the Chars object is to be sorted before, equal or after the object on the right side of the operation. It accepts any object that implements to_s:
'é'.mb_chars <=> 'ü'.mb_chars # => -1
See String#<=> for more details.
93 94 95 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 93 def <=>(other) @wrapped_string <=> other.to_s end |
- (Object) =~(other)
Like String#=~ only it returns the character offset (in codepoints) instead of the byte offset.
Example:
'Café périferôl'.mb_chars =~ /ô/ # => 12
116 117 118 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 116 def =~(other) @wrapped_string =~ other end |
- (Object) []=(*args)
Like String#[]=, except instead of byte offsets you specify character offsets.
Example:
s = "Müller"
s.mb_chars[2] = "e" # Replace character with offset 2
s
# => "Müeler"
s = "Müller"
s.mb_chars[1, 2] = "ö" # Replace 2 characters at character offset 1
s
# => "Möler"
266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 266 def []=(*args) replace_by = args.pop # Indexed replace with regular expressions already works if args.first.is_a?(Regexp) @wrapped_string[*args] = replace_by else result = Unicode.u_unpack(@wrapped_string) case args.first when Fixnum raise IndexError, "index #{args[0]} out of string" if args[0] >= result.length min = args[0] max = args[1].nil? ? min : (min + args[1] - 1) range = Range.new(min, max) replace_by = [replace_by].pack('U') if replace_by.is_a?(Fixnum) when Range raise RangeError, "#{args[0]} out of range" if args[0].min >= result.length range = args[0] else needle = args[0].to_s min = index(needle) max = min + Unicode.u_unpack(needle).length - 1 range = Range.new(min, max) end result[range] = Unicode.u_unpack(replace_by) @wrapped_string.replace(result.pack('U*')) end end |
- (Boolean) acts_like_string?
Enable more predictable duck-typing on String-like classes. See Object#acts_like?.
71 72 73 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 71 def acts_like_string? true end |
- (Object) capitalize
Converts the first character to uppercase and the remainder to lowercase.
Example:
'über'.mb_chars.capitalize.to_s # => "Über"
359 360 361 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 359 def capitalize (slice(0) || chars('')).upcase + (slice(1..-1) || chars('')).downcase end |
- (Object) center(integer, padstr = ' ')
Works just like String#center, only integer specifies characters instead of bytes.
Example:
"¾ cup".mb_chars.center(8).to_s
# => " ¾ cup "
"¾ cup".mb_chars.center(8, " ").to_s # Use non-breaking whitespace
# => " ¾ cup "
234 235 236 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 234 def center(integer, padstr=' ') justify(integer, :center, padstr) end |
- (Object) compose
Performs composition on all the characters.
Example:
'é'.length # => 3
'é'.mb_chars.compose.to_s.length # => 2
397 398 399 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 397 def compose chars(Unicode.compose_codepoints(Unicode.u_unpack(@wrapped_string)).pack('U*')) end |
- (Object) decompose
Performs canonical decomposition on all the characters.
Example:
'é'.length # => 2
'é'.mb_chars.decompose.to_s.length # => 3
388 389 390 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 388 def decompose chars(Unicode.decompose_codepoints(:canonical, Unicode.u_unpack(@wrapped_string)).pack('U*')) end |
- (Object) downcase
Convert characters in the string to lowercase.
Example:
'VĚDA A VÝZKUM'.mb_chars.downcase.to_s # => "věda a výzkum"
351 352 353 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 351 def downcase chars(Unicode.apply_mapping @wrapped_string, :lowercase_mapping) end |
- (Object) g_length
Returns the number of grapheme clusters in the string.
Example:
'क्षि'.mb_chars.length # => 4
'क्षि'.mb_chars.g_length # => 3
406 407 408 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 406 def g_length Unicode.g_unpack(@wrapped_string).length end |
- (Boolean) include?(other)
Returns true if contained string contains other. Returns false otherwise.
Example:
'Café'.mb_chars.include?('é') # => true
140 141 142 143 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 140 def include?(other) # We have to redefine this method because Enumerable defines it. @wrapped_string.include?(other) end |
- (Object) index(needle, offset = 0)
Returns the position needle in the string, counting in codepoints. Returns nil if needle isn't found.
Example:
'Café périferôl'.mb_chars.index('ô') # => 12
'Café périferôl'.mb_chars.index(/\w/u) # => 0
150 151 152 153 154 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 150 def index(needle, offset=0) wrapped_offset = first(offset).wrapped_string.length index = @wrapped_string.index(needle, wrapped_offset) index ? (Unicode.u_unpack(@wrapped_string.slice(0...index)).size) : nil end |
- (Object) insert(offset, fragment)
Inserts the passed string at specified codepoint offsets.
Example:
'Café'.mb_chars.insert(4, ' périferôl').to_s # => "Café périferôl"
124 125 126 127 128 129 130 131 132 133 134 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 124 def insert(offset, fragment) unpacked = Unicode.u_unpack(@wrapped_string) unless offset > unpacked.length @wrapped_string.replace( Unicode.u_unpack(@wrapped_string).insert(offset, *Unicode.u_unpack(fragment)).pack('U*') ) else raise IndexError, "index #{offset} out of string" end self end |
- (Object) limit(limit)
Limit the byte size of the string to a number of bytes without breaking characters. Usable when the storage for a string is limited for some reason.
Example:
'こんにちは'.mb_chars.limit(7).to_s # => "こん"
335 336 337 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 335 def limit(limit) slice(0...translate_offset(limit)) end |
- (Object) ljust(integer, padstr = ' ')
Works just like String#ljust, only integer specifies characters instead of bytes.
Example:
"¾ cup".mb_chars.rjust(8).to_s
# => "¾ cup "
"¾ cup".mb_chars.rjust(8, " ").to_s # Use non-breaking whitespace
# => "¾ cup "
221 222 223 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 221 def ljust(integer, padstr=' ') justify(integer, :left, padstr) end |
- (Object) lstrip
Strips entire range of Unicode whitespace from the left of the string.
182 183 184 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 182 def lstrip chars(@wrapped_string.gsub(Unicode::LEADERS_PAT, '')) end |
- (Object) normalize(form = nil)
Returns the KC normalization of the string by default. NFKC is considered the best normalization form for passing strings to databases and validations.
-
form - The form you want to normalize in. Should be one of the following: :c, :kc, :d, or :kd. Default is ActiveSupport::Multibyte::Unicode.default_normalization_form
379 380 381 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 379 def normalize(form = nil) chars(Unicode.normalize(@wrapped_string, form)) end |
- (Object) ord
Returns the codepoint of the first character in the string.
Example:
'こんにちは'.mb_chars.ord # => 12371
195 196 197 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 195 def ord Unicode.u_unpack(@wrapped_string)[0] end |
- (Boolean) respond_to?(method, include_private = false)
Returns true if obj responds to the given method. Private methods are included in the search only if the optional second parameter evaluates to true.
66 67 68 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 66 def respond_to?(method, include_private=false) super || @wrapped_string.respond_to?(method, include_private) end |
- (Object) reverse
Reverses all characters in the string.
Example:
'Café'.mb_chars.reverse.to_s # => 'éfaC'
298 299 300 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 298 def reverse chars(Unicode.g_unpack(@wrapped_string).reverse.flatten.pack('U*')) end |
- (Object) rindex(needle, offset = nil)
Returns the position needle in the string, counting in codepoints, searching backward from offset or the end of the string. Returns nil if needle isn't found.
Example:
'Café périferôl'.mb_chars.rindex('é') # => 6
'Café périferôl'.mb_chars.rindex(/\w/u) # => 13
163 164 165 166 167 168 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 163 def rindex(needle, offset=nil) offset ||= length wrapped_offset = first(offset).wrapped_string.length index = @wrapped_string.rindex(needle, wrapped_offset) index ? (Unicode.u_unpack(@wrapped_string.slice(0...index)).size) : nil end |
- (Object) rjust(integer, padstr = ' ')
Works just like String#rjust, only integer specifies characters instead of bytes.
Example:
"¾ cup".mb_chars.rjust(8).to_s
# => " ¾ cup"
"¾ cup".mb_chars.rjust(8, " ").to_s # Use non-breaking whitespace
# => " ¾ cup"
208 209 210 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 208 def rjust(integer, padstr=' ') justify(integer, :right, padstr) end |
- (Object) rstrip
Strips entire range of Unicode whitespace from the right of the string.
177 178 179 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 177 def rstrip chars(@wrapped_string.gsub(Unicode::TRAILERS_PAT, '')) end |
- (Object) size Also known as: length
Returns the number of codepoints in the string
171 172 173 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 171 def size Unicode.u_unpack(@wrapped_string).size end |
- (Object) slice(*args) Also known as: []
Implements Unicode-aware slice with codepoints. Slicing on one point returns the codepoints for that character.
Example:
'こんにちは'.mb_chars.slice(2..3).to_s # => "にち"
307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 307 def slice(*args) if args.size > 2 raise ArgumentError, "wrong number of arguments (#{args.size} for 1)" # Do as if we were native elsif (args.size == 2 && !(args.first.is_a?(Numeric) || args.first.is_a?(Regexp))) raise TypeError, "cannot convert #{args.first.class} into Integer" # Do as if we were native elsif (args.size == 2 && !args[1].is_a?(Numeric)) raise TypeError, "cannot convert #{args[1].class} into Integer" # Do as if we were native elsif args[0].kind_of? Range cps = Unicode.u_unpack(@wrapped_string).slice(*args) result = cps.nil? ? nil : cps.pack('U*') elsif args[0].kind_of? Regexp result = @wrapped_string.slice(*args) elsif args.size == 1 && args[0].kind_of?(Numeric) character = Unicode.u_unpack(@wrapped_string)[args[0]] result = character && [character].pack('U') else cps = Unicode.u_unpack(@wrapped_string).slice(*args) result = cps && cps.pack('U*') end result && chars(result) end |
- (Object) split(*args)
Works just like String#split, with the exception that the items in the resulting list are Chars instances instead of String. This makes chaining methods easier.
Example:
'Café périferôl'.mb_chars.split(/é/).map { |part| part.upcase.to_s } # => ["CAF", " P", "RIFERÔL"]
249 250 251 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 249 def split(*args) @wrapped_string.split(*args).map { |i| i.mb_chars } end |
- (Object) strip
Strips entire range of Unicode whitespace from the right and left of the string.
187 188 189 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 187 def strip rstrip.lstrip end |
- (Object) tidy_bytes(force = false)
Replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.
Passing true will forcibly tidy all bytes, assuming that the string's encoding is entirely CP1252 or ISO-8859-1.
413 414 415 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 413 def tidy_bytes(force = false) chars(Unicode.tidy_bytes(@wrapped_string, force)) end |
- (Object) titleize Also known as: titlecase
Capitalizes the first letter of every word, when possible.
Example:
"ÉL QUE SE ENTERÓ".mb_chars.titleize # => "Él Que Se Enteró"
"日本語".mb_chars.titleize # => "日本語"
368 369 370 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 368 def titleize chars(downcase.to_s.gsub(/\b('?[\S])/u) { Unicode.apply_mapping $1, :uppercase_mapping }) end |
- (Object) upcase
Convert characters in the string to uppercase.
Example:
'Laurent, où sont les tests ?'.mb_chars.upcase.to_s # => "LAURENT, OÙ SONT LES TESTS ?"
343 344 345 |
# File 'activesupport/lib/active_support/multibyte/chars.rb', line 343 def upcase chars(Unicode.apply_mapping @wrapped_string, :uppercase_mapping) end |