Class: GoogleBookSearch

Inherits:
Service show all
Includes:
MetadataHelper, UmlautHttp
Defined in:
lib/service_adaptors/google_book_search.rb

Overview

Service that searches Google Book Search to determine viewability. It searches by ISBN, OCLCNUM and/or LCCN.

Uses Google Books API, code.google.com/apis/books/docs/v1/getting_started.html code.google.com/apis/books/docs/v1/using.html

If a full view is available it returns a fulltext service response. If partial view is available, return as "limited experts". If no view at all, still includes a link in highlighted_links, to pay

lip service to google branding requirements.

Unfortunately there is no way tell which of the noview books provide search, although some do -- search is advertised if full or partial view is available.

If a thumbnail_url is returned in the responses, a cover image is displayed.

Google API Key

Setting an api key in :api_key STRONGLY recommended, or you'll probably get rate limited (not clear what the limit is with no api key supplied). You may have to ask for higher rate limit for your api key than the default 1000/day, which you can do through the google api console: code.google.com/apis/console

I requested 50k with this message, and was quickly approved with no questions "Services for academic library (Johns Hopkins Libraries) web applications to match Google Books availability to items presented by our catalog, OpenURL link resolver, and other software. "

Recommend setting your 'per user limit' to something crazy high, as well as requesting more quota.

Constant Summary

ViewFullValue =

Identifiers used in API response to indicate viewability level

'ALL_PAGES'
ViewPartialValue =
'PARTIAL'
ViewNoneValue =

None might also be 'snippet', but Google doesn't want to distinguish

'NO_PAGES'
ViewUnknownValue =
'UNKNOWN'

Constants inherited from Service

Service::LinkOutFilterTask, Service::StandardTask

Instance Attribute Summary (collapse)

Attributes inherited from Service

#name, #priority, #request, #service_id, #session_id, #status, #task

Instance Method Summary (collapse)

Methods included from UmlautHttp

#http_fetch, #proxy_like_headers

Methods inherited from Service

#handle_wrapper, #link_out_filter, #preempted_by, required_config_params, #response_to_view_data, #session, #update_session, #view_data_from_service_type

Constructor Details

- (GoogleBookSearch) initialize(config)



60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# File 'lib/service_adaptors/google_book_search.rb', line 60

def initialize(config)    
  @url = 'https://www.googleapis.com/books/v1/volumes?q='
  
  @display_name = 'Google Books'
  # number of full views to show
  @num_full_views = 1
  # default on, to enhance our metadata with stuff from google
  @referent_enhance = true
  # google api key strongly recommended, otherwise you'll
  # probably get rate limited. 
  @api_key = nil
  
  # While you can theoretically look up by LCCN on Google Books,
  # we have found FREQUENT false positives. There's no longer any
  # way to even report these to Google. By default, don't lookup
  # by LCCN. 
  @lookup_by_lccn = false
  
  super(config)
end

Instance Attribute Details

- (Object) display_name (readonly)

attr_reader is important for tests



47
48
49
# File 'lib/service_adaptors/google_book_search.rb', line 47

def display_name
  @display_name
end

- (Object) num_full_views (readonly)

attr_reader is important for tests



47
48
49
# File 'lib/service_adaptors/google_book_search.rb', line 47

def num_full_views
  @num_full_views
end

- (Object) url (readonly)

attr_reader is important for tests



47
48
49
# File 'lib/service_adaptors/google_book_search.rb', line 47

def url
  @url
end

Instance Method Details

- (Object) add_cover_image(request, url)



385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
# File 'lib/service_adaptors/google_book_search.rb', line 385

def add_cover_image(request, url)
  zoom_url = url.clone
  
  # if we're sent to a page other than the frontcover then strip out the
  # page number and insert front cover
  zoom_url.sub!(/&pg=.*?&/, '&printsec=frontcover&')
  
  # hack out the 'curl' if we can
  zoom_url.sub!('&edge=curl', '')
  
  request.add_service_response({
      :service=>self, 
      :display_text => 'Cover Image',
      :url => zoom_url, 
      :size => "medium"
    },
    [ServiceTypeValue[:cover_image]])
end

- (Object) add_search_inside(request, data)



314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
# File 'lib/service_adaptors/google_book_search.rb', line 314

def add_search_inside(request, data)
  # Just take the first one we find, if multiple
  searchable_view = find_entries(data, [ViewFullValue, ViewPartialValue])[0]        
  
  if ( searchable_view )
    url = searchable_view["volumeInfo"]["infoLink"]
    
    request.add_service_response( 
      {:service => self,
      :display_text=>@display_name,
      :url=> remove_query_context(url)},
      [:search_inside]
     )                  
  end
  
end

- (Object) build_headers(request)

We don't need to fake a proxy request anymore, but we still include X-Forwarded-For so google can return location-appropriate availability. If there's an existing X-Forwarded-For, we respect it and add on to it.



240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
# File 'lib/service_adaptors/google_book_search.rb', line 240

def build_headers(request)
  original_forwarded_for = nil
  if (request.http_env && request.http_env['HTTP_X_FORWARDED_FOR'])
    original_forwarded_for = request.http_env['HTTP_X_FORWARDED_FOR']                                  
  end

  # we used to prepare a comma seperated list in x-forwarded-for if
  # we had multiple requests, as per the x-forwarded-for spec, but I
  # think Google doesn't like it. 
  
  ip_address = (original_forwarded_for ?
      original_forwarded_for  :
      request.client_ip_addr.to_s)
  
  return {} if ip_address.blank?

  # If we've got a comma-seperated list from an X-Forwarded-For, we
  # can't send it on to google, google won't accept that, just take
  # the first one in the list, which is actually the ultimate client
  # IP. split returns the whole string if seperator isn't found, convenient.
  ip_address = ip_address.split(",").first
  
  # If all we have is an internal/private IP from the internal network,
  # do NOT send that to Google, or Google will give you a 503 error
  # and refuse to process your request, as of 7 sep 2011. sigh.
  # Also if it doesn't look like an IP at all, forget it, don't send it.     
  if ((! ip_address =~ /^\d+\.\d+\.\d+\/\d$/) || 
     ip_address.start_with?("10.") || 
     ip_address.start_with?("172.16") || 
     ip_address.start_with?("192.168"))
     return {}
  else    
    return {'X-Forwarded-For' => ip_address }
  end
end

- (Object) create_fulltext_service_response(request, data)

We only create a fulltext service response if we have a full view. We create only as many full views as are specified in config.



292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
# File 'lib/service_adaptors/google_book_search.rb', line 292

def create_fulltext_service_response(request, data)
  display_name = @display_name

  full_views = find_entries(data, ViewFullValue)
  return nil if full_views.empty?
  
  count = 0
  full_views.each do |fv|
    
    uri = fv["volumeInfo"]["previewLink"]
        
    request.add_service_response(
      {:service=>self, 
        :display_text=>display_name, 
        :url=>remove_query_context(uri) },           
      [ :fulltext ]) 
    count += 1
    break if count == @num_full_views
  end   
  return true
end

- (Object) do_query(bibkeys, request)



202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
# File 'lib/service_adaptors/google_book_search.rb', line 202

def do_query(bibkeys, request)    
  headers = build_headers(request)
  link = @url + bibkeys
  if @api_key
    link += "&key=#{@api_key}"
  end
  
  # Add on limit to only request books, not magazines. 
  link += "&printType=books"

  Rails.logger.debug("GoogleBookSearch requesting: #{link}")        
  response = http_fetch(link, :headers => headers, :raise_on_http_error_code => false)        
  data = JSON.parse(response.body)
  
  # If Google gives us an error cause it says it can't geo-locate, 
  # remove the IP, log warning, and try again. 
  
  if (data["error"] && data["error"]["errors"] &&
      data["error"]["errors"].find {|h| h["reason"] == "unknownLocation"} )
    Rails.logger.warn("GoogleBookSearch: geo-locate error, retrying without X-Forwarded-For: '#{link}' headers: #{headers.inspect} #{response.inspect}\n    #{data.inspect}")
    
    response = http_fetch(link, :raise_on_http_error_code => false)        
    data = JSON.parse(response.body)
      
  end
  
  
  if (! response.kind_of?(Net::HTTPSuccess)) || data["error"]      
    Rails.logger.error("GoogleBookSearch error: '#{link}' headers: #{headers.inspect} #{response.inspect}\n    #{data.inspect}")
  end
      
  return data
end

create highlighted_link service response for partial and noview Only show one web link. prefer a partial view over a noview. Some noviews have a snippet/search, but we have no way to tell.



334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
# File 'lib/service_adaptors/google_book_search.rb', line 334

def do_web_links(request, data)

  # some noview items will have a snippet view, but we have no way to tell
  info_views = find_entries(data, ViewPartialValue)
  viewability = ViewPartialValue
  
  if info_views.blank?
    info_views = find_entries(data, ViewNoneValue)
    viewability = ViewNoneValue  
  end
  
  # Shouldn't ever get to this point, but just in case
  return nil if info_views.blank?
  
  url = ''
  iv = info_views.first
  type = nil
  if (viewability == ViewPartialValue && 
      url = iv["volumeInfo"]["previewLink"])
    display_text = @display_name
    type = ServiceTypeValue[:excerpts]
  else
    url = url = iv["volumeInfo"]["infoLink"]
    display_text = "Book Information"
    type = ServiceTypeValue[:highlighted_link]
  end
  request.add_service_response( { 
      :service=>self,    
      :url=> remove_query_context(url),
      :display_text=>display_text},
        [type]    
     )
end

- (Object) element_enhance(request, rft_key, value)

Will not over-write existing referent values.



166
167
168
169
170
# File 'lib/service_adaptors/google_book_search.rb', line 166

def element_enhance(request, rft_key, value)
  if (value)
    request.referent.enhance_referent(rft_key, value.to_s, true, false, :overwrite => false)
  end
end

- (Object) enhance_referent(request, data)

Take the FIRST hit from google, and use it's values to enhance our metadata. Will NOT overwrite existing data.



120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
# File 'lib/service_adaptors/google_book_search.rb', line 120

def enhance_referent(request, data)
  
  entry = data["items"].first
  

  if (volumeInfo = entry["volumeInfo"])
    
    title = volumeInfo["title"]
    title += ": #{volumeInfo["subtitle"]}" if (title && volumeInfo["subtitle"])
    
    element_enhance(request, "title", title)
    element_enhance(request, "au", volumeInfo["authors"].first) if volumeInfo["authors"]
    element_enhance(request, "pub", volumeInfo["publisher"])
    
    element_enhance(request, "tpages", volumeInfo["pageCount"])
    
    if (date = volumeInfo["publishedDate"] && date =~ /^(\d\d\d\d)/)
      element_enhance(request, "date", $1)
    end
    
    # LCCN is only rarely included, but is sometimes, eg:
    # "industryIdentifiers"=>[{"type"=>"OTHER", "identifier"=>"LCCN:72627172"}],          
    # Also "LCCN:76630875"
    #
    # And sometimes OCLC number like:
    # "industryIdentifiers"=>[{"type"=>"OTHER", "identifier"=>"OCLC:12345678"}],
    #        
    (volumeInfo["industryIdentifiers"] || []).each do |hash|
      
      if hash["type"] == "ISBN_13"
        element_enhance(request, "isbn", hash["identifier"])
        
      elsif hash["type"] == "OTHER" && hash["identifier"].starts_with?("LCCN:")
        lccn = normalize_lccn(  hash["identifier"].slice(5, hash["identifier"].length)  )
        request.referent.add_identifier("info:lccn/#{lccn}")
        
      elsif hash["type"] == "OTHER" && hash["identifier"].starts_with?("OCLC:")
        oclcnum = normalize_lccn(  hash["identifier"].slice(5, hash["identifier"].length)  )
        request.referent.add_identifier("info:oclcnum/#{oclcnum}")
      end
    
    end              
  end            
end

- (Object) find_entries(gbs_response, viewabilities)



276
277
278
279
280
281
282
283
284
285
286
287
# File 'lib/service_adaptors/google_book_search.rb', line 276

def find_entries(gbs_response, viewabilities)
  unless (viewabilities.kind_of?(Array))
    viewabilities = [viewabilities]
  end

  entries = gbs_response["items"].find_all do |entry|
    viewability = entry["accessInfo"]["viewability"]
    (viewability && viewabilities.include?(viewability))           
  end

  return entries
end

- (Object) find_thumbnail_url(data)

Not all responses have a thumbnail_url. We look for them and return the 1st.



372
373
374
375
376
377
378
379
380
381
382
# File 'lib/service_adaptors/google_book_search.rb', line 372

def find_thumbnail_url(data)
  entries = data["items"].collect do |entry|      
    entry["volumeInfo"]["imageLinks"]["thumbnail"] if entry["volumeInfo"] && entry["volumeInfo"]["imageLinks"]      
  end
  
  # removenill values
  entries.compact!    
  
  # pick the first of the available thumbnails, or nil
  return entries[0]
end

- (Object) get_bibkeys(rft)

returns nil or escaped string of bibkeys to increase the chances of good hit, we send all available bibkeys and later dedupe by id. FIXME Assumes we only have one of each kind of identifier.



177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
# File 'lib/service_adaptors/google_book_search.rb', line 177

def get_bibkeys(rft)
  isbn = get_identifier(:urn, "isbn", rft)
  oclcnum = get_identifier(:info, "oclcnum", rft)
  lccn = get_lccn(rft)

  # Google doesn't officially support oclc/lccn search, but does
  # index as token with prefix smashed up right with identifier
  # eg http://books.google.com/books/feeds/volumes?q=OCLC32012617
  #
  # Except turns out doing it as a phrase search is important! Or
  # google's normalization/tokenization does odd things. 
  keys = []
  keys << ('isbn:' + isbn) if isbn
  keys << ('"' + "OCLC" + oclcnum + '"') if oclcnum
  # Only use LCCN if we've got nothing else, and we're allowing it. 
  # it returns many false positives. 
  if @lookup_by_lccn && lccn && keys.length == 0
    keys << ('"' + 'LCCN' + lccn + '"')
  end
  
  return nil if keys.empty?
  keys = CGI.escape( keys.join(' OR ') )
  return keys
end

- (Object) handle(request)



81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# File 'lib/service_adaptors/google_book_search.rb', line 81

def handle(request)

  bibkeys = get_bibkeys(request.referent)
  return request.dispatched(self, true) if bibkeys.nil?

  data = do_query(bibkeys, request)
  
  
  if data.blank? || data["error"]
    # fail fatal
    return request.dispatched(self, false)
  end
  
  # 0 hits, return. 
  return request.dispatched(self, true) if data["totalItems"] == 0
  
  enhance_referent(request, data) if @referent_enhance
  
  #return full views first
  full_views_shown = create_fulltext_service_response(request, data)
  
  # Add search_inside link if appropriate
  add_search_inside(request, data)
  
  # only if no full view is shown, add links for partial view or noview
  unless full_views_shown
    do_web_links(request, data)
  end
  
  thumbnail_url = find_thumbnail_url(data)
  if thumbnail_url
    add_cover_image(request, thumbnail_url)    
  end

  return request.dispatched(self, true)
end

- (Object) remove_query_context(url)

Google gives us URL to the book that contains a 'dq' param with the original query, which for us is an ISSN/LCCN/OCLCnum query, which we don't actually want to leave in there.



407
408
409
# File 'lib/service_adaptors/google_book_search.rb', line 407

def remove_query_context(url)
  url.sub(/&dq=[^&]+/, '')    
end

- (Object) response_url(service_type, submitted_params)

Catch url_for call for search_inside, because we're going to redirect



412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
# File 'lib/service_adaptors/google_book_search.rb', line 412

def response_url(service_type, )
  if ( ! (service_type.service_type_value.name == "search_inside" ))
    return super(service_type, )
  else
    # search inside!
    base = service_type.service_response[:url]
    query = CGI.escape(["query"] || "")
    # attempting to reverse engineer a bit to get 'snippet'
    # style results instead of 'onepage' style results. 
    # snippet seem more user friendly, and are what google's own
    # interface seems to give you by default. but 'onepage' is the
    # default from our deep link, but if we copy the JS hash data,
    # it looks like we can get Google to 'snippet'.       
    url = base + "&q=#{query}#v=snippet&q=#{query}&f=false"
    return url
  end
end

- (Object) service_types_generated



49
50
51
52
53
54
55
56
57
58
# File 'lib/service_adaptors/google_book_search.rb', line 49

def service_types_generated
  types= [
    ServiceTypeValue[:fulltext], 
    ServiceTypeValue[:cover_image],
    ServiceTypeValue[:highlighted_link],
    ServiceTypeValue[:search_inside],
    ServiceTypeValue[:excerpts]]
  types.push(ServiceTypeValue[:referent_enhance]) if @referent_enhance
  return types
end