Class: PDF::Reader::ObjectHash

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/pdf/reader/object_hash.rb

Overview

Provides low level access to the objects in a PDF file via a hash-like object.

A PDF file can be viewed as a large hash map. It is a series of objects stored at precise byte offsets, and a table that maps object IDs to byte offsets. Given an object ID, looking up an object is an O(1) operation.

Each PDF object can be mapped to a ruby object, so by passing an object ID to the [] method, a ruby representation of that object will be retrieved.

The class behaves much like a standard Ruby hash, including the use of the Enumerable mixin. The key difference is no []= method - the hash is read only.

Basic Usage

h = PDF::Reader::ObjectHash.new("somefile.pdf")
h[1]
=> 3469

h[PDF::Reader::Reference.new(1,0)]
=> 3469

Direct Known Subclasses

Hash

Instance Attribute Summary (collapse)

Instance Method Summary (collapse)

Constructor Details

- (ObjectHash) initialize(input, opts = {})

Creates a new ObjectHash object. Input can be a string with a valid filename or an IO-like object.

Valid options:

:password - the user password to decrypt the source PDF


41
42
43
44
45
46
47
48
# File 'lib/pdf/reader/object_hash.rb', line 41

def initialize(input, opts = {})
  @io          = extract_io_from(input)
  @pdf_version = read_version
  @xref        = PDF::Reader::XRef.new(@io)
  @trailer     = @xref.trailer
  @cache       = PDF::Reader::ObjectCache.new
  @sec_handler = build_security_handler(opts)
end

Instance Attribute Details

- (Object) default

Returns the value of attribute default



31
32
33
# File 'lib/pdf/reader/object_hash.rb', line 31

def default
  @default
end

- (Object) pdf_version (readonly)

Returns the value of attribute pdf_version



32
33
34
# File 'lib/pdf/reader/object_hash.rb', line 32

def pdf_version
  @pdf_version
end

- (Object) trailer (readonly)

Returns the value of attribute trailer



32
33
34
# File 'lib/pdf/reader/object_hash.rb', line 32

def trailer
  @trailer
end

Instance Method Details

- (Object) [](key)

Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.

If an int is used, the object with that ID and a generation number of 0 will be returned.

If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.



71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/pdf/reader/object_hash.rb', line 71

def [](key)
  return default if key.to_i <= 0

  unless key.is_a?(PDF::Reader::Reference)
    key = PDF::Reader::Reference.new(key.to_i, 0)
  end

  if @cache.has_key?(key)
    @cache[key]
  elsif xref[key].is_a?(Fixnum)
    buf = new_buffer(xref[key])
    @cache[key] = decrypt(key, Parser.new(buf, self).object(key.id, key.gen))
  elsif xref[key].is_a?(PDF::Reader::Reference)
    container_key = xref[key]
    object_streams[container_key] ||= PDF::Reader::ObjectStream.new(object(container_key))
    @cache[key] = object_streams[container_key][key.id]
  end
rescue InvalidObjectError
  return default
end

- (Object) deref!(key)

Recursively dereferences the object refered to be key. If key is not a PDF::Reader::Reference, the key is returned unchanged.



103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# File 'lib/pdf/reader/object_hash.rb', line 103

def deref!(key)
  case object = deref(key)

    when Hash
      object.each do |key, value|
        object[key] = deref! value
      end

    when PDF::Reader::Stream
      deref! object.hash

    when Array
      object.each_with_index do |value, index|
        object[index] = deref! value
      end

  end

  object
end

- (Object) each(&block) Also known as: each_pair

iterate over each key, value. Just like a ruby hash.



149
150
151
152
153
# File 'lib/pdf/reader/object_hash.rb', line 149

def each(&block)
  @xref.each do |ref|
    yield ref, self[ref]
  end
end

- (Object) each_key(&block)

iterate over each key. Just like a ruby hash.



158
159
160
161
162
# File 'lib/pdf/reader/object_hash.rb', line 158

def each_key(&block)
  each do |id, obj|
    yield id
  end
end

- (Object) each_value(&block)

iterate over each value. Just like a ruby hash.



166
167
168
169
170
# File 'lib/pdf/reader/object_hash.rb', line 166

def each_value(&block)
  each do |id, obj|
    yield obj
  end
end

- (Boolean) empty?

return true if there are no objects in this file

Returns:

  • (Boolean)


181
182
183
# File 'lib/pdf/reader/object_hash.rb', line 181

def empty?
  size == 0 ? true : false
end

- (Boolean) encrypted?

Returns:

  • (Boolean)


261
262
263
# File 'lib/pdf/reader/object_hash.rb', line 261

def encrypted?
  trailer.has_key?(:Encrypt)
end

- (Object) fetch(key, local_default = nil)

Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.

If an int is used, the object with that ID and a generation number of 0 will be returned.

If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.

local_default is the object that will be returned if the requested key doesn't exist.



136
137
138
139
140
141
142
143
144
145
# File 'lib/pdf/reader/object_hash.rb', line 136

def fetch(key, local_default = nil)
  obj = self[key]
  if obj
    return obj
  elsif local_default
    return local_default
  else
    raise IndexError, "#{key} is invalid" if key.to_i <= 0
  end
end

- (Boolean) has_key?(check_key) Also known as: include?, key?, member?, value?

return true if the specified key exists in the file. key can be an int or a PDF::Reader::Reference

Returns:

  • (Boolean)


188
189
190
191
192
193
194
195
196
197
198
# File 'lib/pdf/reader/object_hash.rb', line 188

def has_key?(check_key)
  # TODO update from O(n) to O(1)
  each_key do |key|
    if check_key.kind_of?(PDF::Reader::Reference)
      return true if check_key == key
    else
      return true if check_key.to_i == key.id
    end
  end
  return false
end

- (Boolean) has_value?(value)

return true if the specifiedvalue exists in the file

Returns:

  • (Boolean)


205
206
207
208
209
210
211
# File 'lib/pdf/reader/object_hash.rb', line 205

def has_value?(value)
  # TODO update from O(n) to O(1)
  each_value do |obj|
    return true if obj == value
  end
  return false
end

- (Object) keys

return an array of all keys in the file



220
221
222
223
224
# File 'lib/pdf/reader/object_hash.rb', line 220

def keys
  ret = []
  each_key { |k| ret << k }
  ret
end

- (Object) obj_type(ref)

returns the type of object a ref points to



51
52
53
54
55
# File 'lib/pdf/reader/object_hash.rb', line 51

def obj_type(ref)
  self[ref].class.to_s.to_sym
rescue
  nil
end

- (Object) object(key) Also known as: deref

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.



95
96
97
# File 'lib/pdf/reader/object_hash.rb', line 95

def object(key)
  key.is_a?(PDF::Reader::Reference) ? self[key] : key
end

- (Object) page_references

returns an array of PDF::Reader::References. Each reference in the array points a Page object, one for each page in the PDF. The first reference is page 1, second reference is page 2, etc.

Useful for apps that want to extract data from specific pages.



256
257
258
259
# File 'lib/pdf/reader/object_hash.rb', line 256

def page_references
  root  = fetch(trailer[:Root])
  @page_references ||= get_page_objects(root[:Pages]).flatten
end

- (Object) size Also known as: length

return the number of objects in the file. An object with multiple generations is counted once.



174
175
176
# File 'lib/pdf/reader/object_hash.rb', line 174

def size
  xref.size
end

- (Boolean) stream?(ref)

returns true if the supplied references points to an object with a stream

Returns:

  • (Boolean)


58
59
60
# File 'lib/pdf/reader/object_hash.rb', line 58

def stream?(ref)
  self.has_key?(ref) && self[ref].is_a?(PDF::Reader::Stream)
end

- (Object) to_a

return an array of arrays. Each sub array contains a key/value pair.



242
243
244
245
246
247
248
# File 'lib/pdf/reader/object_hash.rb', line 242

def to_a
  ret = []
  each do |id, obj|
    ret << [id, obj]
  end
  ret
end

- (Object) to_s



214
215
216
# File 'lib/pdf/reader/object_hash.rb', line 214

def to_s
  "<PDF::Reader::ObjectHash size: #{self.size}>"
end

- (Object) values

return an array of all values in the file



228
229
230
231
232
# File 'lib/pdf/reader/object_hash.rb', line 228

def values
  ret = []
  each_value { |v| ret << v }
  ret
end

- (Object) values_at(*ids)

return an array of all values from the specified keys



236
237
238
# File 'lib/pdf/reader/object_hash.rb', line 236

def values_at(*ids)
  ids.map { |id| self[id] }
end