Class: PDF::Reader::ObjectHash
- Inherits:
-
Object
- Object
- PDF::Reader::ObjectHash
- Includes:
- Enumerable
- Defined in:
- lib/pdf/reader/object_hash.rb
Overview
Provides low level access to the objects in a PDF file via a hash-like object.
A PDF file can be viewed as a large hash map. It is a series of objects stored at precise byte offsets, and a table that maps object IDs to byte offsets. Given an object ID, looking up an object is an O(1) operation.
Each PDF object can be mapped to a ruby object, so by passing an object ID to the [] method, a ruby representation of that object will be retrieved.
The class behaves much like a standard Ruby hash, including the use of the Enumerable mixin. The key difference is no []= method - the hash is read only.
Basic Usage
h = PDF::Reader::ObjectHash.new("somefile.pdf")
h[1]
=> 3469
h[PDF::Reader::Reference.new(1,0)]
=> 3469
Direct Known Subclasses
Instance Attribute Summary (collapse)
-
- (Object) default
Returns the value of attribute default.
-
- (Object) pdf_version
readonly
Returns the value of attribute pdf_version.
-
- (Object) trailer
readonly
Returns the value of attribute trailer.
Instance Method Summary (collapse)
-
- (Object) [](key)
Access an object from the PDF.
-
- (Object) deref!(key)
Recursively dereferences the object refered to be key.
-
- (Object) each(&block)
(also: #each_pair)
iterate over each key, value.
-
- (Object) each_key(&block)
iterate over each key.
-
- (Object) each_value(&block)
iterate over each value.
-
- (Boolean) empty?
return true if there are no objects in this file.
- - (Boolean) encrypted?
-
- (Object) fetch(key, local_default = nil)
Access an object from the PDF.
-
- (Boolean) has_key?(check_key)
(also: #include?, #key?, #member?, #value?)
return true if the specified key exists in the file.
-
- (Boolean) has_value?(value)
return true if the specifiedvalue exists in the file.
-
- (ObjectHash) initialize(input, opts = {})
constructor
Creates a new ObjectHash object.
-
- (Object) keys
return an array of all keys in the file.
-
- (Object) obj_type(ref)
returns the type of object a ref points to.
-
- (Object) object(key)
(also: #deref)
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
- (Object) page_references
returns an array of PDF::Reader::References.
-
- (Object) size
(also: #length)
return the number of objects in the file.
-
- (Boolean) stream?(ref)
returns true if the supplied references points to an object with a stream.
-
- (Object) to_a
return an array of arrays.
- - (Object) to_s
-
- (Object) values
return an array of all values in the file.
-
- (Object) values_at(*ids)
return an array of all values from the specified keys.
Constructor Details
- (ObjectHash) initialize(input, opts = {})
Creates a new ObjectHash object. Input can be a string with a valid filename or an IO-like object.
Valid options:
:password - the user password to decrypt the source PDF
41 42 43 44 45 46 47 48 |
# File 'lib/pdf/reader/object_hash.rb', line 41 def initialize(input, opts = {}) @io = extract_io_from(input) @pdf_version = read_version @xref = PDF::Reader::XRef.new(@io) @trailer = @xref.trailer @cache = PDF::Reader::ObjectCache.new @sec_handler = build_security_handler(opts) end |
Instance Attribute Details
- (Object) default
Returns the value of attribute default
31 32 33 |
# File 'lib/pdf/reader/object_hash.rb', line 31 def default @default end |
- (Object) pdf_version (readonly)
Returns the value of attribute pdf_version
32 33 34 |
# File 'lib/pdf/reader/object_hash.rb', line 32 def pdf_version @pdf_version end |
- (Object) trailer (readonly)
Returns the value of attribute trailer
32 33 34 |
# File 'lib/pdf/reader/object_hash.rb', line 32 def trailer @trailer end |
Instance Method Details
- (Object) [](key)
Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.
If an int is used, the object with that ID and a generation number of 0 will be returned.
If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
# File 'lib/pdf/reader/object_hash.rb', line 71 def [](key) return default if key.to_i <= 0 unless key.is_a?(PDF::Reader::Reference) key = PDF::Reader::Reference.new(key.to_i, 0) end if @cache.has_key?(key) @cache[key] elsif xref[key].is_a?(Fixnum) buf = new_buffer(xref[key]) @cache[key] = decrypt(key, Parser.new(buf, self).object(key.id, key.gen)) elsif xref[key].is_a?(PDF::Reader::Reference) container_key = xref[key] object_streams[container_key] ||= PDF::Reader::ObjectStream.new(object(container_key)) @cache[key] = object_streams[container_key][key.id] end rescue InvalidObjectError return default end |
- (Object) deref!(key)
Recursively dereferences the object refered to be key. If key is not a PDF::Reader::Reference, the key is returned unchanged.
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
# File 'lib/pdf/reader/object_hash.rb', line 103 def deref!(key) case object = deref(key) when Hash object.each do |key, value| object[key] = deref! value end when PDF::Reader::Stream deref! object.hash when Array object.each_with_index do |value, index| object[index] = deref! value end end object end |
- (Object) each(&block) Also known as: each_pair
iterate over each key, value. Just like a ruby hash.
149 150 151 152 153 |
# File 'lib/pdf/reader/object_hash.rb', line 149 def each(&block) @xref.each do |ref| yield ref, self[ref] end end |
- (Object) each_key(&block)
iterate over each key. Just like a ruby hash.
158 159 160 161 162 |
# File 'lib/pdf/reader/object_hash.rb', line 158 def each_key(&block) each do |id, obj| yield id end end |
- (Object) each_value(&block)
iterate over each value. Just like a ruby hash.
166 167 168 169 170 |
# File 'lib/pdf/reader/object_hash.rb', line 166 def each_value(&block) each do |id, obj| yield obj end end |
- (Boolean) empty?
return true if there are no objects in this file
181 182 183 |
# File 'lib/pdf/reader/object_hash.rb', line 181 def empty? size == 0 ? true : false end |
- (Boolean) encrypted?
261 262 263 |
# File 'lib/pdf/reader/object_hash.rb', line 261 def encrypted? trailer.has_key?(:Encrypt) end |
- (Object) fetch(key, local_default = nil)
Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.
If an int is used, the object with that ID and a generation number of 0 will be returned.
If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.
local_default is the object that will be returned if the requested key doesn't exist.
136 137 138 139 140 141 142 143 144 145 |
# File 'lib/pdf/reader/object_hash.rb', line 136 def fetch(key, local_default = nil) obj = self[key] if obj return obj elsif local_default return local_default else raise IndexError, "#{key} is invalid" if key.to_i <= 0 end end |
- (Boolean) has_key?(check_key) Also known as: include?, key?, member?, value?
return true if the specified key exists in the file. key can be an int or a PDF::Reader::Reference
188 189 190 191 192 193 194 195 196 197 198 |
# File 'lib/pdf/reader/object_hash.rb', line 188 def has_key?(check_key) # TODO update from O(n) to O(1) each_key do |key| if check_key.kind_of?(PDF::Reader::Reference) return true if check_key == key else return true if check_key.to_i == key.id end end return false end |
- (Boolean) has_value?(value)
return true if the specifiedvalue exists in the file
205 206 207 208 209 210 211 |
# File 'lib/pdf/reader/object_hash.rb', line 205 def has_value?(value) # TODO update from O(n) to O(1) each_value do |obj| return true if obj == value end return false end |
- (Object) keys
return an array of all keys in the file
220 221 222 223 224 |
# File 'lib/pdf/reader/object_hash.rb', line 220 def keys ret = [] each_key { |k| ret << k } ret end |
- (Object) obj_type(ref)
returns the type of object a ref points to
51 52 53 54 55 |
# File 'lib/pdf/reader/object_hash.rb', line 51 def obj_type(ref) self[ref].class.to_s.to_sym rescue nil end |
- (Object) object(key) Also known as: deref
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
95 96 97 |
# File 'lib/pdf/reader/object_hash.rb', line 95 def object(key) key.is_a?(PDF::Reader::Reference) ? self[key] : key end |
- (Object) page_references
returns an array of PDF::Reader::References. Each reference in the array points a Page object, one for each page in the PDF. The first reference is page 1, second reference is page 2, etc.
Useful for apps that want to extract data from specific pages.
256 257 258 259 |
# File 'lib/pdf/reader/object_hash.rb', line 256 def page_references root = fetch(trailer[:Root]) @page_references ||= get_page_objects(root[:Pages]).flatten end |
- (Object) size Also known as: length
return the number of objects in the file. An object with multiple generations is counted once.
174 175 176 |
# File 'lib/pdf/reader/object_hash.rb', line 174 def size xref.size end |
- (Boolean) stream?(ref)
returns true if the supplied references points to an object with a stream
58 59 60 |
# File 'lib/pdf/reader/object_hash.rb', line 58 def stream?(ref) self.has_key?(ref) && self[ref].is_a?(PDF::Reader::Stream) end |
- (Object) to_a
return an array of arrays. Each sub array contains a key/value pair.
242 243 244 245 246 247 248 |
# File 'lib/pdf/reader/object_hash.rb', line 242 def to_a ret = [] each do |id, obj| ret << [id, obj] end ret end |
- (Object) to_s
214 215 216 |
# File 'lib/pdf/reader/object_hash.rb', line 214 def to_s "<PDF::Reader::ObjectHash size: #{self.size}>" end |
- (Object) values
return an array of all values in the file
228 229 230 231 232 |
# File 'lib/pdf/reader/object_hash.rb', line 228 def values ret = [] each_value { |v| ret << v } ret end |
- (Object) values_at(*ids)
return an array of all values from the specified keys
236 237 238 |
# File 'lib/pdf/reader/object_hash.rb', line 236 def values_at(*ids) ids.map { |id| self[id] } end |