Skip to main content

GitHub

This connector captures data from GitHub repositories and organizations into Flow collections via GitHub's REST API.

It is available for use in the Flow web application. For local development or open-source workflows, ghcr.io/estuary/source-github:dev provides the latest version of the connector as a Docker image. You can also follow the link in your browser to see past image versions.

This connector is based on an open-source connector from a third party, with modifications for performance in the Flow system. You can find their documentation here, but keep in mind that the two versions may be significantly different.

Supported data resources

When you configure the connector, you specify a list of GitHub organizations and/or repositories from which to capture data.

From your selection, the following data resources are captured:

Full refresh (batch) resourcesIncremental (real-time supported) resources
AssigneesComments
BranchesCommit comment reactions
CollaboratorsCommit comments
Issue labelsCommits
Pull request commitsDeployments
TagsEvents
Team membersIssue comment reactions
Team membershipsIssue events
TeamsIssue milestones
UsersIssue reactions
Issues
Project cards
Project columns
Projects
Pull request comment reactions
Pull request stats
Pull requests
Releases
Repositories
Review comments
Reviews
Stargazers
Workflow runs
Workflows

Each resource is mapped to a Flow collection through a separate binding.

info

The /start_date field is not applicable to the following resources:

  • Assignees
  • Branches
  • Collaborators
  • Issue labels
  • Organizations
  • Pull request commits
  • Pull request stats
  • Repositories
  • Tags
  • Teams
  • Users

Prerequisites

There are two ways to authenticate with GitHub when capturing data into Flow: using OAuth2, and manually, by generating a personal access token. Their prerequisites differ.

OAuth is recommended for simplicity in the Flow web app; the access token method is the only supported method using the command line. Which authentication method you choose depends on the policies of your organization. Github has special organization settings that need to be enabled in order for users to be able to access repos that are part of an organization.

Using OAuth2 to authenticate with GitHub in the Flow web app

  • A GitHub user account with access to the repositories of interest, and which is a member of organizations of interest.

  • User may need to request access in Github under the user's personal settings (not the organization settings) by going to Applications then Authorized OAuth Apps on Github. Click the app or the image next to the app and request access under "Organization access". After a user has made the request, the organization administrator can grant access on the "Third-party application access policy" page. See additional details on this Github doc.

Configuring the connector specification manually using personal access token

  • A GitHub user account with access to the repositories of interest, and which is a member of organizations of interest.

  • A GitHub personal access token. You may use multiple tokens to balance the load on your API quota.

  • User may need to get the organization's administrator to grant access under "Third-party Access" then "Personal access tokens".

Configuration

You configure connectors either in the Flow web app, or by directly editing the catalog specification file. See connectors to learn more about using connectors. The values and specification sample below provide configuration details specific to the GitHub source connector.

Properties

Endpoint

The properties in the table below reflect the manual authentication method. If you're working in the Flow web app, you'll use OAuth2, so some of these properties aren't required.

PropertyTitleDescriptionTypeRequired/Default
/branchBranch (Optional)Space-delimited list of GitHub repository branches to pull commits for, e.g. `estuary/flow/your-branch`. If no branches are specified for a repository, the default branch will be pulled.string
/credentialsAuthenticationChoose how to authenticate to GitHubobjectRequired
/credentials/option_titleAuthentication methodSet to PAT Credentials for manual authenticationstring
/credentials/personal_access_tokenAccess tokenPersonal access token, used for manual authentication. You may include multiple access tokens as a comma separated list.
/page_size_for_large_streamsPage size for large streams (Optional)The Github connector captures from several resources with a large amount of data. The page size of such resources depends on the size of your repository. We recommended that you specify values between 10 and 30.integer10
/repositoryGitHub RepositoriesSpace-delimited list of GitHub organizations/repositories, e.g. `estuary/flow` for a single repository, `estuary/*` to get all repositories from an organization and `estuary/flow estuary/another-repo` for multiple repositories.stringRequired
/start_dateStart dateThe date from which you'd like to replicate data from GitHub in the format YYYY-MM-DDT00:00:00Z. For the resources that support this configuration, only data generated on or after the start date will be replicated. This field doesn't apply to all resources.stringRequired

Bindings

PropertyTitleDescriptionTypeRequired/Default
/streamStreamGitHub resource from which collection is captured.stringRequired
/syncModeSync modeConnection method.stringRequired

Sample

This sample specification reflects the manual authentication method.

captures:
${PREFIX}/${CAPTURE_NAME}:
endpoint:
connector:
image: ghcr.io/estuary/source-github:dev
config:
credentials:
option_title: PAT Credentials
personal_access_token: {secret}
page_size_for_large_streams: 10
repository: estuary/flow
start_date: 2022-01-01T00:00:00Z
bindings:
- resource:
stream: assignees
syncMode: full_refresh
target: ${PREFIX}/assignees
{...}