Handling Internal Tools in Mirrored Repositories
When open-source repositories hosted on GitHub are replicated to internal customer environments (e.g., Bitbucket), customers often need to add internal directories for commercial scanning tools (code coverage, security vulnerabilities, etc.).
This document outlines strategies to keep internal tooling isolated from the upstream public GitHub repository to avoid accidental leaks or merge conflicts.
Strategies for Isolation
1. The “Wrapper Repository” (Git Submodules) — Highly Recommended
Instead of adding the internal commercial tooling directly into the mirrored GitHub repository, the customer can create a brand-new internal Bitbucket repository that acts as a “wrapper.”
- How it works: The customer creates a new Bitbucket repo. They commit their commercial tooling configurations, scan scripts, and extra directories to this repository. Then, they add your GitHub open-source repository as a Git Submodule inside it.
- The pipeline: When their CI/CD pipeline runs, it checks out the wrapper repository, pulls down the submodule (your open-source code), and runs the scanning tools against the submodule directory.
- Why it’s great: Your GitHub repository remains a 100% exact mirror. The customer never has to worry about accidentally pushing internal files back to your GitHub repo, and you never have to worry about merge conflicts during updates.
2. The Upstream/Downstream Branching Strategy
If the customer insists on having the internal files and the open-source code tracked in the exact same repository, they must use a branch isolation strategy.
- How it works:
- The customer maintains a
mainbranch that is a strict mirror of your GitHub repository. Nothing internal is ever committed here. - They create an
internal-mainbranch branched offmain. - They commit their internal scanning directories and files to
internal-main.
- The customer maintains a
- Syncing updates: When you release new code on GitHub, the customer pulls those changes into their
mainbranch, and then runsgit merge maininto theirinternal-mainbranch. - Contributing back: If the customer finds a bug and wants to push a fix back to your GitHub, they branch off
main(notinternal-main), commit the fix, and push that feature branch back to GitHub. - Why it’s great: It utilizes standard Git workflows. The internal directories literally do not exist in the history of the
mainbranch, so they can never be synced back to GitHub.
3. Externalize the Configuration in CI/CD
Often, security and code-coverage tools require a configuration file (like a .sonar folder, Fortify configs, or pipeline YAMLs). Instead of committing these directly to the source code repository, the customer can inject them at runtime.
- How it works: The customer keeps their Bitbucket mirrored repository completely identical to your GitHub repo. They place the scanning tool directories into a second, completely separate Bitbucket repository.
- The pipeline: During their CI/CD build process, the pipeline clones the mirrored open-source repo, then immediately clones the “tooling” repo, copies the required directories into the workspace, and runs the scan.
- Why it’s great: It entirely decouples the source code from the infrastructure/tooling, avoiding any Git history modifications.
4. Gitignore (Only if the directories are generated, not committed)
If the commercial tools simply generate directories (like an out/, coverage/, or reports/ folder) and those folders do not actually need to be tracked in Bitbucket’s version control:
- How it works: Ensure the customer does not run
git addon those directories. - They can add the directories to a global
.gitignoreon their CI/CD build servers, or add.git/info/excludelocally in their Bitbucket repo environment. - Note: You could add these commercial directory names to the
.gitignorefile hosted on your public GitHub. It doesn’t hurt your open-source repo to ignore standard commercial tool outputs (e.g., adding.sonar/to your public.gitignore), and it prevents the customer from accidentally committing them.
Recommendation for Customers
“To ensure seamless updates from our upstream GitHub repository without merge conflicts or accidentally leaking your internal configurations, we recommend not committing your tooling directories directly to the mirrored source code branch. Instead, either keep your tooling in a parent wrapper repository using Git Submodules, or maintain a distinct internal branch that merges updates from our main branch but is never pushed back upstream.”