Class: Gitlab::GithubImport::Client

Inherits:
Object
  • Object
show all
Includes:
Gitlab::GithubImport::Clients::SearchRepos, Utils::StrongMemoize
Defined in:
lib/gitlab/github_import/client.rb

Overview

HTTP client for interacting with the GitHub API.

This class is basically a fancy wrapped around Octokit while adding some functionality to deal with rate limiting and parallel imports. Usage is mostly the same as Octokit, for example:

client = GithubImport::Client.new('hunter2')

client.labels.each do |label|
  puts label.name
end

Defined Under Namespace

Classes: Page

Constant Summary collapse

SEARCH_MAX_REQUESTS_PER_MINUTE =
30
DEFAULT_PER_PAGE =
100
CLIENT_CONNECTION_ERROR =

used/set in sawyer agent which octokit uses

::Faraday::ConnectionFailed
RATE_LIMIT_THRESHOLD =

The minimum number of requests we want to keep available.

We don’t use a value of 0 as multiple threads may be using the same token in parallel. This could result in all of them hitting the GitHub rate limit at once. The threshold is put in place to not hit the limit in most cases.

50
SEARCH_RATE_LIMIT_THRESHOLD =
3

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Gitlab::GithubImport::Clients::SearchRepos

#count_repos_by_relation_type_graphql, #search_repos_by_name_graphql

Constructor Details

#initialize(token, host: nil, per_page: DEFAULT_PER_PAGE, parallel: true) ⇒ Client

token - The GitHub API token to use.

host - The GitHub hostname. If nil, github.com will be used.

per_page - The number of objects that should be displayed per page.

parallel - When set to true hitting the rate limit will result in a

dedicated error being raised. When set to `false` we will
instead just `sleep()` until the rate limit is reset. Setting
this value to `true` for parallel importing is crucial as
otherwise hitting the rate limit will result in a thread
being blocked in a `sleep()` call for up to an hour.


50
51
52
53
54
55
56
57
58
59
60
61
62
# File 'lib/gitlab/github_import/client.rb', line 50

def initialize(token, host: nil, per_page: DEFAULT_PER_PAGE, parallel: true)
  @host = host
  @octokit = ::Octokit::Client.new(
    access_token: token,
    per_page: per_page,
    api_endpoint: api_endpoint,
    web_endpoint: web_endpoint
  )

  @octokit.connection_options[:ssl] = { verify: verify_ssl }

  @parallel = parallel
end

Instance Attribute Details

#octokitObject (readonly)

Returns the value of attribute octokit.



20
21
22
# File 'lib/gitlab/github_import/client.rb', line 20

def octokit
  @octokit
end

Instance Method Details

#api_endpointObject



245
246
247
# File 'lib/gitlab/github_import/client.rb', line 245

def api_endpoint
  formatted_api_endpoint || custom_api_endpoint || default_api_endpoint
end

#branch_protection(repo_name, branch_name) ⇒ Object



126
127
128
# File 'lib/gitlab/github_import/client.rb', line 126

def branch_protection(repo_name, branch_name)
  with_rate_limit { octokit.branch_protection(repo_name, branch_name).to_h }
end

#branches(*args) ⇒ Object



118
119
120
# File 'lib/gitlab/github_import/client.rb', line 118

def branches(*args)
  each_object(:branches, *args)
end

#collaborators(*args) ⇒ Object



122
123
124
# File 'lib/gitlab/github_import/client.rb', line 122

def collaborators(*args)
  each_object(:collaborators, *args)
end

#custom_api_endpointObject



253
254
255
# File 'lib/gitlab/github_import/client.rb', line 253

def custom_api_endpoint
  github_omniauth_provider.dig('args', 'client_options', 'site')
end

#custom_web_endpointObject



257
258
259
260
261
262
263
# File 'lib/gitlab/github_import/client.rb', line 257

def custom_web_endpoint
  return unless custom_api_endpoint

  uri = URI.parse(custom_api_endpoint)
  uri.path = ''
  uri.to_s.chomp('/')
end

#default_api_endpointObject



265
266
267
# File 'lib/gitlab/github_import/client.rb', line 265

def default_api_endpoint
  OmniAuth::Strategies::GitHub.default_options[:client_options][:site] || ::Octokit::Default.api_endpoint
end

#each_object(method, *args, &block) ⇒ Object

Iterates over all of the objects for the given method (e.g. ‘:labels`).

method - The method to send to Octokit for querying data. args - Any arguments to pass to the Octokit method.



171
172
173
174
175
176
177
178
179
# File 'lib/gitlab/github_import/client.rb', line 171

def each_object(method, *args, &block)
  return to_enum(__method__, method, *args) unless block

  each_page(method, nil, *args) do |page|
    page.objects.each do |object|
      yield object.to_h
    end
  end
end

#each_page(method, resume_url, *args) {|Page| ... } ⇒ Enumerator

Fetches data from the GitHub API and yields a Page object for every page of data, without loading all of them into memory.

rubocop: disable GitlabSecurity/PublicSend

Parameters:

  • method (Symbol)

    The Octokit method to use for getting the data

  • resume_url (String, nil)

    The GitHub link header URL to resume pagination. When nil, the method will be invoked from the first page

  • args (Array)

    Arguments to pass to the Octokit method

Yields:

  • (Page)

    Each page of data from the API

Returns:

  • (Enumerator)

    When no block is given



141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
# File 'lib/gitlab/github_import/client.rb', line 141

def each_page(method, resume_url, *args, &block)
  return to_enum(__method__, method, resume_url, *args) unless block

  collection = with_rate_limit do
    if resume_url.present?
      octokit.get(resume_url)
    else
      octokit.public_send(method, *args)
    end
  end

  yield Page.new(collection, resume_url)

  next_page = octokit.last_response.rels[:next]

  while next_page
    raise Exceptions::InvalidURLError, 'Invalid pagination URL' unless valid_next_url?(next_page.href)

    response = with_rate_limit { next_page.get }

    yield Page.new(response.data, next_page.href)

    next_page = response.rels[:next]
  end
end

#github_omniauth_providerObject



273
274
275
# File 'lib/gitlab/github_import/client.rb', line 273

def github_omniauth_provider
  @github_omniauth_provider ||= Gitlab::Auth::OAuth::Provider.config_for('github').to_h
end

#labels(*args) ⇒ Object



106
107
108
# File 'lib/gitlab/github_import/client.rb', line 106

def labels(*args)
  each_object(:labels, *args)
end

#milestones(*args) ⇒ Object



110
111
112
# File 'lib/gitlab/github_import/client.rb', line 110

def milestones(*args)
  each_object(:milestones, *args)
end

#parallel?Boolean

Returns:

  • (Boolean)


64
65
66
# File 'lib/gitlab/github_import/client.rb', line 64

def parallel?
  @parallel
end

#pull_request(repo_name, iid) ⇒ Object



102
103
104
# File 'lib/gitlab/github_import/client.rb', line 102

def pull_request(repo_name, iid)
  with_rate_limit { octokit.pull_request(repo_name, iid).to_h }
end

#pull_request_review_requests(repo_name, iid) ⇒ Object



87
88
89
# File 'lib/gitlab/github_import/client.rb', line 87

def pull_request_review_requests(repo_name, iid)
  with_rate_limit { octokit.pull_request_review_requests(repo_name, iid).to_h }
end

#pull_request_reviews(repo_name, iid) ⇒ Object



83
84
85
# File 'lib/gitlab/github_import/client.rb', line 83

def pull_request_reviews(repo_name, iid)
  each_object(:pull_request_reviews, repo_name, iid)
end

#raise_or_wait_for_rate_limit(message) ⇒ Object



222
223
224
225
226
227
228
229
230
# File 'lib/gitlab/github_import/client.rb', line 222

def raise_or_wait_for_rate_limit(message)
  rate_limit_counter.increment

  if parallel?
    raise RateLimitError, message
  else
    sleep(rate_limit_resets_in)
  end
end

#rate_limit_counterObject



277
278
279
280
281
282
# File 'lib/gitlab/github_import/client.rb', line 277

def rate_limit_counter
  @rate_limit_counter ||= Gitlab::Metrics.counter(
    :github_importer_rate_limit_hits,
    'The number of times we hit the GitHub rate limit when importing projects'
  )
end

#rate_limit_resets_inObject



232
233
234
235
236
237
# File 'lib/gitlab/github_import/client.rb', line 232

def rate_limit_resets_in
  # We add a few seconds to the rate limit so we don't _immediately_
  # resume when the rate limit resets as this may result in us performing
  # a request before GitHub has a chance to reset the limit.
  octokit.rate_limit.resets_in + 5
end

#rate_limiting_enabled?Boolean

Returns:

  • (Boolean)


239
240
241
242
243
# File 'lib/gitlab/github_import/client.rb', line 239

def rate_limiting_enabled?
  strong_memoize(:rate_limiting_enabled) do
    api_endpoint.include?('.github.com')
  end
end

#releases(*args) ⇒ Object



114
115
116
# File 'lib/gitlab/github_import/client.rb', line 114

def releases(*args)
  each_object(:releases, *args)
end

#remaining_requestsObject



214
215
216
# File 'lib/gitlab/github_import/client.rb', line 214

def remaining_requests
  octokit.rate_limit.remaining
end

#repos(options = {}) ⇒ Object



91
92
93
# File 'lib/gitlab/github_import/client.rb', line 91

def repos(options = {})
  octokit.repos(nil, options).map(&:to_h)
end

#repository(name) ⇒ Object

Returns the details of a GitHub repository.

name - The path (in the form ‘owner/repository`) of the repository.



98
99
100
# File 'lib/gitlab/github_import/client.rb', line 98

def repository(name)
  with_rate_limit { octokit.repo(name).to_h }
end

#request_count_counterObject



284
285
286
287
288
289
# File 'lib/gitlab/github_import/client.rb', line 284

def request_count_counter
  @request_counter ||= Gitlab::Metrics.counter(
    :github_importer_request_count,
    'The number of GitHub API calls performed when importing projects'
  )
end

#requests_limitObject



218
219
220
# File 'lib/gitlab/github_import/client.rb', line 218

def requests_limit
  octokit.rate_limit.limit
end

#requests_remaining?Boolean

Returns ‘true` if we’re still allowed to perform API calls. Search API has rate limit of 30, use lowered threshold when search is used.

Returns:

  • (Boolean)


206
207
208
209
210
211
212
# File 'lib/gitlab/github_import/client.rb', line 206

def requests_remaining?
  if requests_limit == SEARCH_MAX_REQUESTS_PER_MINUTE
    return remaining_requests > SEARCH_RATE_LIMIT_THRESHOLD
  end

  remaining_requests > RATE_LIMIT_THRESHOLD
end

#user(username, options = {}) ⇒ Object

Returns the details of a GitHub user. 304 (Not Modified) status means the user is cached - API won’t return user data.

Parameters:

  • username (String)

    the username of the user.

  • options (Hash) (defaults to: {})

    the optional parameters.



73
74
75
76
77
78
79
80
81
# File 'lib/gitlab/github_import/client.rb', line 73

def user(username, options = {})
  with_rate_limit do
    user = octokit.user(username, options)

    next if octokit.last_response&.status == 304

    user.to_h
  end
end

#verify_sslObject



269
270
271
# File 'lib/gitlab/github_import/client.rb', line 269

def verify_ssl
  github_omniauth_provider.fetch('verify_ssl', true)
end

#web_endpointObject



249
250
251
# File 'lib/gitlab/github_import/client.rb', line 249

def web_endpoint
  formatted_web_endpoint || custom_web_endpoint || ::Octokit::Default.web_endpoint
end

#with_rate_limitObject

Yields the supplied block, responding to any rate limit errors.

The exact strategy used for handling rate limiting errors depends on whether we are running in parallel mode or not. For more information see ‘#rate_or_wait_for_rate_limit`.



186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
# File 'lib/gitlab/github_import/client.rb', line 186

def with_rate_limit
  return with_retry { yield } unless rate_limiting_enabled?

  request_count_counter.increment

  raise_or_wait_for_rate_limit('Internal threshold reached') unless requests_remaining?

  begin
    with_retry { yield }
  rescue ::Octokit::TooManyRequests => e
    raise_or_wait_for_rate_limit(e.response_body)

    # This retry will only happen when running in sequential mode as we'll
    # raise an error in parallel mode.
    retry
  end
end