Sync GitHub Repositories
Introduction
When collaborating with external customers or partners to enhance and customize GitHub repositories, standard branching strategies can break down. Customers often have their own internal private Git servers (like GitHub Enterprise or Bitbucket).
Previously, light-bot utilized a two-branch model (master and sync) that attempted to automatically merge changes between the two environments. This led to frequent and severe merge conflicts when multiple teams attempted to update the same repository simultaneously, as bots cannot intelligently resolve textual conflicts.
To solve this, light-bot has adopted a Hub and Spoke (Fork and Pull) model that strictly separates mirroring from contribution, entirely relying on Pull Requests for code integration.
Architecture and Flow
The new workflow ensures that the customer’s internal Git server acts as a “Spoke” while GitHub remains the “Hub” (Source of Truth).
The SyncGitRepoTask executes hourly and follows this precise flow:
- Mirror Master: The bot replicates the
masterbranch from GitHub to the internal Gitmasterbranch. The internalmasterbranch is treated as read-only for customer developers. - Customer Development: Customer teams create standard feature branches (e.g.,
feature/custom-login) on their internal Git server. - Internal Approval: Customers open a Pull Request on their internal Git system to go through their own internal approvals and security checks.
- Handoff (Rename to Sync): Once approved internally, the developer renames their feature branch from
feature/custom-logintosync. This acts as a handoff signal forlight-bot. - Replicate to GitHub: During the next hourly job, the bot detects the
syncbranch and pushes it to GitHub. - GitHub PR Creation: GitHub utilizes a workflow action to automatically create a Pull Request from
synctomaster. - Merge and Cleanup: The core internal team reviews and merges the PR on GitHub.
- Automated Pruning: On the subsequent bot run,
light-botchecks if thesyncbranch exists internally and verifies if its commits are fully merged intomaster. If they are, the bot automatically drops thesyncbranch from the internal server, clearing the queue for the next feature.
Edge Cases and Rules
To ensure this workflow operates smoothly, two strict rules must be observed:
1. Merge Commits Only (No Squash/Rebase)
To safely detect if a sync branch can be pruned from the customer’s server, the bot executes git merge-base --is-ancestor sync origin/master.
Critical Rule: The core team merging the PR on GitHub MUST use the “Create a Merge Commit” option.
If the team uses “Squash and Merge” or “Rebase and Merge”, GitHub generates entirely new commit hashes. As a result, the ancestor check will fail, the bot will think the branch is unmerged, and it will fail to clean up the sync branch on the customer’s server.
2. Concurrency and Queuing
Because the bot only looks for a single sync branch as the handoff mechanism, customer teams cannot push multiple features simultaneously.
If Team A renames their branch to sync, Team B must wait until Team A’s PR is merged on GitHub and the bot deletes the sync branch before Team B can rename their feature to sync. This queuing mechanism is intentional; it serializes contributions and prevents massive, difficult-to-resolve merge conflicts across distributed systems.