Class: Hpricot::Elements
Overview
Once you've matched a list of elements, you will often need to handle them as a group. Or you may want to perform the same action on each of them. Hpricot::Elements is an extension of Ruby's array class, with some methods added for altering elements contained in the array.
If you need to create an element array from regular elements:
Hpricot::Elements[ele1, ele2, ele3]
Assuming that ele1, ele2 and ele3 contain element objects (Hpricot::Elem, Hpricot::Doc, etc.)
Continuing Searches
Usually the Hpricot::Elements you're working on comes from a search you've done. Well, you can continue searching the list by using the same at and search methods you can use on plain elements.
elements = doc.search("/div/p")
elements = elements.search("/a[@href='http://hoodwink.d/']")
elements = elements.at("img")
Altering Elements
When you're altering elements in the list, your changes will be reflected in the document you started searching from.
doc = Hpricot("That's my <b>spoon</b>, Tyler.")
doc.at("b").swap("<i>fork</i>")
doc.to_html
#=> "That's my <i>fork</i>, Tyler."
Getting More Detailed
If you can't find a method here that does what you need, you may need to loop through the elements and find a method in Hpricot::Container::Trav which can do what you need.
For example, you may want to search for all the H3 header tags in a document and grab all the tags underneath the header, but not inside the header. A good method for this is next_sibling:
doc.search("h3").each do |h3|
while ele = h3.next_sibling
ary << ele # stuff away all the elements under the h3
end
end
Most of the useful element methods are in the mixins Hpricot::Traverse and Hpricot::Container::Trav.
Constant Summary
- ATTR_RE =
%r!\[ *(?:(@)([\w\(\)-]+)|([\w\(\)-]+\(\))) *([~\!\|\*$\^=]*) *'?"?([^'"]*)'?"? *\]!i- BRACK_RE =
%r!(\[) *([^\]]*) *\]+!i- FUNC_RE =
%r!(:)?([a-zA-Z0-9\*_-]*)\( *[\"']?([^ \)]*?)['\"]? *\)!- CUST_RE =
%r!(:)([a-zA-Z0-9\*_-]*)()!- CATCH_RE =
%r!([:\.#]*)([a-zA-Z0-9\*_-]+)!
Class Method Summary (collapse)
-
+ (Object) expand(ele1, ele2, excl = false)
Given two elements, attempt to gather an Elements array of everything between (and including) those two elements.
- + (Object) filter(nodes, expr, truth = true)
Instance Method Summary (collapse)
-
- (Object) add_class(class_name)
Adds the class to all matched elements.
-
- (Object) after(str = nil, &blk)
Just after each element in this list, add some HTML.
-
- (Object) append(str = nil, &blk)
Add to the end of the contents inside each element in this list.
-
- (Object) at(expr, &blk)
(also: #%)
Searches this list for the first element (or child of these elements) matching the CSS or XPath expression expr.
-
- (Object) attr(key, value = nil, &blk)
(also: #set)
Gets and sets attributes on all matched elements.
-
- (Object) before(str = nil, &blk)
Add some HTML just previous to each element in this list.
-
- (Object) empty
Empty the elements in this list, by removing their insides.
- - (Object) filter(expr)
-
- (Object) inner_html(*string)
(also: #html, #innerHTML)
Returns an HTML fragment built of the contents of each element in this list.
-
- (Object) inner_html=(string)
(also: #html=, #innerHTML=)
Replaces the contents of each element in this list.
-
- (Object) inner_text
(also: #text)
Returns an string containing the text contents of each element in this list.
- - (Object) not(expr)
-
- (Object) prepend(str = nil, &blk)
Add to the start of the contents inside each element in this list.
- - (Object) pretty_print(q)
-
- (Object) remove
Remove all elements in this list from the document which contains them.
-
- (Object) remove_attr(name)
Remove an attribute from each of the matched elements.
-
- (Object) remove_class(name = nil)
Removes a class from all matched elements.
-
- (Object) search(*expr, &blk)
(also: #/)
Searches this list for any elements (or children of these elements) matching the CSS or XPath expression expr.
-
- (Object) to_html
(also: #to_s)
Convert this group of elements into a complete HTML fragment, returned as a string.
-
- (Object) wrap(str = nil, &blk)
Wraps each element in the list inside the element created by HTML str.
Methods inherited from Array
#_to_s, #clear, #clear_all, #dark?, #light?
Class Method Details
+ (Object) expand(ele1, ele2, excl = false)
Given two elements, attempt to gather an Elements array of everything between (and including) those two elements.
315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 |
# File 'lib/ext/hpricot/elements.rb', line 315 def self.(ele1, ele2, excl=false) ary = [] offset = excl ? -1 : 0 if ele1 and ele2 # let's quickly take care of siblings if ele1.parent == ele2.parent ary = ele1.parent.children[ele1.node_position..(ele2.node_position+offset)] else # find common parent p, ele1_p = ele1, [ele1] ele1_p.unshift p while p.respond_to?(:parent) and p = p.parent p, ele2_p = ele2, [ele2] ele2_p.unshift p while p.respond_to?(:parent) and p = p.parent common_parent = ele1_p.zip(ele2_p).select { |p1, p2| p1 == p2 }.flatten.last child = nil if ele1 == common_parent child = ele2 elsif ele2 == common_parent child = ele1 end if child ary = common_parent.children[0..(child.node_position+offset)] end end end return Elements[*ary] end |
+ (Object) filter(nodes, expr, truth = true)
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 |
# File 'lib/ext/hpricot/elements.rb', line 270 def self.filter(nodes, expr, truth = true) until expr.empty? _, *m = *expr.match(/^(?:#{ATTR_RE}|#{BRACK_RE}|#{FUNC_RE}|#{CUST_RE}|#{CATCH_RE})/) break unless _ expr = $' m.compact! if m[0] == '@' m[0] = "@#{m.slice!(2,1).join}" end if m[0] == '[' && m[1] =~ /^\d+$/ m = [":", "nth", m[1].to_i-1] end if m[0] == ":" && m[1] == "not" nodes, = Elements.filter(nodes, m[2], false) elsif "#{m[0]}#{m[1]}" =~ /^(:even|:odd)$/ new_nodes = [] nodes.each_with_index {|n,i| new_nodes.push(n) if (i % 2 == (m[1] == "even" ? 0 : 1)) } nodes = new_nodes elsif "#{m[0]}#{m[1]}" =~ /^(:first|:last)$/ nodes = [nodes.send(m[1])] else meth = "filter[#{m[0]}#{m[1]}]" unless m[0].empty? if meth and Traverse.method_defined? meth args = m[2..-1] else meth = "filter[#{m[0]}]" if Traverse.method_defined? meth args = m[1..-1] end end args << -1 nodes = Elements[*nodes.find_all do |x| args[-1] += 1 x.send(meth, *args) ? truth : !truth end] end end [nodes, expr] end |
Instance Method Details
- (Object) add_class(class_name)
Adds the class to all matched elements.
(doc/"p").add_class("bacon")
Now all paragraphs will have class="bacon".
222 223 224 225 226 227 228 229 |
# File 'lib/ext/hpricot/elements.rb', line 222 def add_class class_name each do |el| next unless el.respond_to? :get_attribute classes = el.get_attribute('class').to_s.split(" ") el.set_attribute('class', classes.push(class_name).uniq.join(" ")) end self end |
- (Object) after(str = nil, &blk)
Just after each element in this list, add some HTML. Pass in an HTML str, which is turned into Hpricot elements.
150 151 152 |
# File 'lib/ext/hpricot/elements.rb', line 150 def after(str = nil, &blk) each { |x| x.parent.insert_after x.make(str, &blk), x } end |
- (Object) append(str = nil, &blk)
Add to the end of the contents inside each element in this list. Pass in an HTML str, which is turned into Hpricot elements.
132 133 134 |
# File 'lib/ext/hpricot/elements.rb', line 132 def append(str = nil, &blk) each { |x| x.html(x.children + x.make(str, &blk)) } end |
- (Object) at(expr, &blk) Also known as: %
Searches this list for the first element (or child of these elements) matching the CSS or XPath expression expr. Root is assumed to be the element scanned.
See Hpricot::Container::Trav.at for more.
67 68 69 |
# File 'lib/ext/hpricot/elements.rb', line 67 def at(expr, &blk) search(expr, &blk).first end |
- (Object) attr(key, value = nil, &blk) Also known as: set
Gets and sets attributes on all matched elements.
Pass in a key on its own and this method will return the string value assigned to that attribute for the first elements. Or nil if the attribute isn't found.
doc.search("a").attr("href")
#=> "http://hacketyhack.net/"
Or, pass in a key and value. This will set an attribute for all matched elements.
doc.search("p").attr("class", "basic")
You may also use a Hash to set a series of attributes:
(doc/"a").attr(:class => "basic", :href => "http://hackety.org/")
Lastly, a block can be used to rewrite an attribute based on the element it belongs to. The block will pass in an element. Return from the block the new value of the attribute.
records.attr("href") { |e| e['href'] + "#top" }
This example adds a #top anchor to each link.
201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
# File 'lib/ext/hpricot/elements.rb', line 201 def attr key, value = nil, &blk if value or blk each do |el| el.set_attribute(key, value || blk[el]) end return self end if key.is_a? Hash key.each { |k,v| self.attr(k,v) } return self else return self[0].get_attribute(key) end end |
- (Object) before(str = nil, &blk)
Add some HTML just previous to each element in this list. Pass in an HTML str, which is turned into Hpricot elements.
144 145 146 |
# File 'lib/ext/hpricot/elements.rb', line 144 def before(str = nil, &blk) each { |x| x.parent.insert_before x.make(str, &blk), x } end |
- (Object) empty
Empty the elements in this list, by removing their insides.
doc = Hpricot("<p> We have <i>so much</i> to say.</p>")
doc.search("i").empty
doc.to_html
=> "<p> We have <i></i> to say.</p>"
126 127 128 |
# File 'lib/ext/hpricot/elements.rb', line 126 def empty each { |x| x.inner_html = nil } end |
- (Object) filter(expr)
347 348 349 350 |
# File 'lib/ext/hpricot/elements.rb', line 347 def filter(expr) nodes, = Elements.filter(self, expr) nodes end |
- (Object) inner_html(*string) Also known as: html, innerHTML
Returns an HTML fragment built of the contents of each element in this list.
If a HTML string is supplied, this method acts like inner_html=.
82 83 84 85 86 87 88 |
# File 'lib/ext/hpricot/elements.rb', line 82 def inner_html(*string) if string.empty? map { |x| x.inner_html }.join else x = self.inner_html = string.pop || x end end |
- (Object) inner_html=(string) Also known as: html=, innerHTML=
Replaces the contents of each element in this list. Supply an HTML string, which is loaded into Hpricot objects and inserted into every element in this list.
95 96 97 |
# File 'lib/ext/hpricot/elements.rb', line 95 def inner_html=(string) each { |x| x.inner_html = string } end |
- (Object) inner_text Also known as: text
Returns an string containing the text contents of each element in this list. All HTML tags are removed.
103 104 105 |
# File 'lib/ext/hpricot/elements.rb', line 103 def inner_text map { |x| x.inner_text }.join end |
- (Object) not(expr)
352 353 354 355 356 357 358 359 |
# File 'lib/ext/hpricot/elements.rb', line 352 def not(expr) if expr.is_a? Traverse nodes = self - [expr] else nodes, = Elements.filter(self, expr, false) end nodes end |
- (Object) prepend(str = nil, &blk)
Add to the start of the contents inside each element in this list. Pass in an HTML str, which is turned into Hpricot elements.
138 139 140 |
# File 'lib/ext/hpricot/elements.rb', line 138 def prepend(str = nil, &blk) each { |x| x.html(x.make(str, &blk) + x.children) } end |
- (Object) pretty_print(q)
6 7 8 |
# File 'lib/ext/hpricot/inspect.rb', line 6 def pretty_print(q) q.object_group(self) { super } end |
- (Object) remove
Remove all elements in this list from the document which contains them.
doc = Hpricot("<html>Remove this: <b>here</b></html>")
doc.search("b").remove
doc.to_html
=> "<html>Remove this: </html>"
115 116 117 |
# File 'lib/ext/hpricot/elements.rb', line 115 def remove each { |x| x.parent.children.delete(x) } end |
- (Object) remove_attr(name)
Remove an attribute from each of the matched elements.
(doc/"input").remove_attr("disabled")
235 236 237 238 239 240 241 |
# File 'lib/ext/hpricot/elements.rb', line 235 def remove_attr name each do |el| next unless el.respond_to? :remove_attribute el.remove_attribute(name) end self end |
- (Object) remove_class(name = nil)
Removes a class from all matched elements.
(doc/"span").remove_class("lightgrey")
Or, to remove all classes:
(doc/"span").remove_class
251 252 253 254 255 256 257 258 259 260 261 262 |
# File 'lib/ext/hpricot/elements.rb', line 251 def remove_class name = nil each do |el| next unless el.respond_to? :get_attribute if name classes = el.get_attribute('class').to_s.split(" ") el.set_attribute('class', (classes - [name]).uniq.join(" ")) else el.remove_attribute("class") end end self end |
- (Object) search(*expr, &blk) Also known as: /
Searches this list for any elements (or children of these elements) matching the CSS or XPath expression expr. Root is assumed to be the element scanned.
See Hpricot::Container::Trav.search for more.
58 59 60 |
# File 'lib/ext/hpricot/elements.rb', line 58 def search(*expr,&blk) Elements[*map { |x| x.search(*expr,&blk) }.flatten.uniq] end |
- (Object) to_html Also known as: to_s
Convert this group of elements into a complete HTML fragment, returned as a string.
74 75 76 |
# File 'lib/ext/hpricot/elements.rb', line 74 def to_html map { |x| x.output("") }.join end |
- (Object) wrap(str = nil, &blk)
Wraps each element in the list inside the element created by HTML str. If more than one element is found in the string, Hpricot locates the deepest spot inside the first element.
doc.search("a[@href]").
wrap(%{<div class="link"><div class="link_inner"></div></div>})
This code wraps every link on the page inside a div.link and a div.link_inner nest.
162 163 164 165 166 167 168 169 170 171 172 173 |
# File 'lib/ext/hpricot/elements.rb', line 162 def wrap(str = nil, &blk) each do |x| wrap = x.make(str, &blk) nest = wrap.detect { |w| w.respond_to? :children } unless nest raise "No wrapping element found." end x.parent.replace_child(x, wrap) nest = nest.children.first until nest.empty? nest.html([x]) end end |