Managing a github organization with infrastructure as code

Managing a github organization with infrastructure as code

Managing a GitHub organization's resources can be complex, regardless of the size of the company. This is particularly the case when there is a great number of teams and repositories to manage, access levels to assign, and user roll-up/roll-out.

This post will share our experience in adopting infrastructure as code (IaC) to manage GitHub organization resources.

The organization where this feature was implemented is an American online retail company based in New York City made up of 3000+ people, 150+ teams, and 500+ repositories. A few months ago it decided to manage GitHub resources by using IaC with terraform, in the research phase/proof of concept (PoC) we found a complete terraform Github provider that could help us to achieve our goal since it provides the ability to programmatically manage repositories, organization, teams, permissions, and projects.

Following the good results obtained with the PoC, we started our journey keeping in mind the challenge of the switch, moving from easy and manual GitHub resources creation or updates to a controlled, standardized, and programmatic way, the switch will undoubtedly involve solving multiple tech challenges, communication, and evangelization to the teams across the organization but we think that the effort invested in the long term will help to evolve easily, provide transparency of the repository configurations, teams members, roles and permissions. It will also accelerate the onboarding and off-boarding process related to the organization's codebase.

Assuming the reader is knowledgeable about terraform and its providers, such part will not be covered in this post but some useful links will be shared in case the reader wants to get more context about it:

Terraform intro.
Github provider.
Manage GitHub with terraform.

Defining standard resources with modules

Defining standard resources with modules allows the building of reusable, modular infrastructure code that can be managed as a single unit. This helps to increase efficiency and reduce errors, as well as make it easier to maintain and update infrastructure.

Thinking about facilitating the use of the GitHub resources, we defined terraform modules to group a set of resources in terms of repositories and teams. For example, the GitHub repository module is composed of 4 main resources: github_repository, github_branch_default, github_branch_protection and github_team_repository

Figure 1: Repository module composition

In order to encapsulate those resources we defined just one module that contains all the properties required so every time that any person inside the organization wants to create a repository they need to use just one resource and fill out all the required properties.

#notifications.tf

module "repository_notifications" {
  source = "git::https://github.com/herrera-luis/infra-modules.git//github-repository?ref=v0.0.10"
  #source                 = "../../../infra-modules/github-repository"
  name               = "notifications"
  description        = "The notification service is a platform that sends timely and relevant notifications or messages to users via different communication channels such as emails, SMS and push notifications"
  allow_merge_commit = false
  auto_init          = false
  topics             = ["notifications", "platform", "python"]
  homepage_url       = "https://notifications.inhouse-service.com"
  visibility         = "private"
  default_branch     = "main"
  archived           = false
  lock_branch        = local.lock_branch
  # Permission options are: pull, triage, push, maintain, admin
  team_access = [
    {
      team_id    = local.teams["notifications"].team_id
      permission = local.permissions.admin
    },
    {
      team_id    = local.teams["sre"].team_id
      permission = local.permissions.pull
    },
  ]
  deploy_branch_protection         = true
  branch_protection_enforce_admins = true
  branch_protection_required_pull_request_reviews = {
    dismiss_stale_reviews           = false
    require_code_owner_reviews      = true
    required_approving_review_count = 1
  }
  delete_branch_on_merge     = true
  enable_issues              = true
  enable_downloads           = true
  enable_wiki                = true
  enable_projects            = true
  enable_vulnerabiliy_alerts = true
}

When working with the module, it's crucial to pay attention to the properties that it supports. During the definition phase, we made the decision to standardize the configurations that all repositories within the organization would support. This standardization ensures consistency and simplifies the process of managing the repositories.

Configurations like providing permissions to individual users were removed and instead of that, we kept just permissions of the teams which means every user has to be inside a team to get access to the repositories, another configuration we removed was the GitHub pages because it was considered a security risk ( It could be a way to expose confidential information) since the majority of the repositories were created with internal or private visibility and the repositories that contain web application (frontend apps) in the development phase are being deployed in a private network that can be accessed just through a VPN.

The github team is composed of 2 primary resources: github_team and github_team_members. We encapsulated those 2 resources in 1 module that we named github-team.

#sre.tf

module "sre_team" {
  source           = "git::https://github.com/herrera-luis/infra-modules.git//github-team?ref=v0.0.10"
  team_name        = "SRE"
  team_description = "The Site Reliability Engineering Team"
  team_privacy     = "closed"
  parent_team_id   = "3118942" # Infrastructure
  team_members = [
    {
      username = github_membership.member["herrera-luis"].username
      role     = "maintainer"
    },
    {
      username = github_membership.member["teammate-1"].username
      role     = "member"
    },
    {
      username = github_membership.member["teammate-2"].username
      role     = "member"
    },
  ]
}

On the team definitions, there was a requirement of implementing nested teams in order to reflect the organization chart and simplify permissions management for large groups, in the team's module that we defined we made use of the property parents_team_id that allowed to us build nested teams and provides child teams with the ability to inherit the parent's access permissions.

Figure 2: Nested teams

An important factor that must be considered when working with the GitHub team resource is; before adding users to the teams they have to be part of the GitHub organization, so you have to find a way to map the GitHub usernames with the role they will have and add them as members of the organization, on the following topic we are going to share how we accomplished it.

Figure 3: User member and team

Github user's membership map

A user is referenced in the org membership code and in one or more teams, this part of the configuration is a manual process since you will need to request the GitHub username and then add it to the resources list. In order to facilitate the management of the user's membership we generated an object map which is a data structure that maps keys to values, as a key we used the usernames, and as the values, we used the role. After they were deployed we exposed their slug and id so the team module resources could reuse them. Let’s see what the user's membership map object looks like:

#users.auto.tfvars

users = {
  "herrera-luis" = {
    org_role = "admin"
  }
  "admin-teammate-1" = {
    org_role = "admin"
  }
  "teammate-1" = {
    org_role = "member"
  }
  "teammate-2" = {
    org_role = "member"
  }
}

Since we defined an object map we have the ability to iterate it over one terraform resource to avoid declaring it multiple times, on terraform to iterate an object you have available the for_each meta-argument so we made use of it and this is how the implementation looks:

#user.tf

resource "github_membership" "member" {
  for_each = var.users
  username = each.key
  role     = each.value.org_role
}

#output

output "users" {
  value = {
    for user, userinfo in var.users : user =>
    {
      login = user
      membership = {
        role = userinfo.org_role
      }
    }
  }
}

Developing an import script

After we defined the module with the standard properties another technical challenge that we had to solve was to keep running the business without breaking anything, which means that we needed to import all the GitHub teams along with the repositories and their current configurations on terraform code, so on the final step after we completed the execution of the import script and if we perform terraform apply everything should be synchronized. We had around 500+ repositories and 150+ teams, we thought that importing one by one would be a nightmare, for that reason we chose to use a script that could automatize that task. Based on our context where most part of the team has experience with the python language we decided to use it and complement it with libraries.

The first task of the script was to get all the properties of each repository in the GitHub organization. So, we were looking for the best libraries that could facilitate that process and we found PyGithub, a complete library that has good integration with GitHub APIs.

The second task of the script was to generate one terraform file per repository, in that way we could manage each repository with its configurations separately. To accomplish that goal we used the Jinja template engine library, the implementation of this solution proved to be highly advantageous for us, resulting in significant benefits, because it was straightforward to use and file generation was transparent. This was the case because we defined a template based on the terraform module and the template was iterating all the properties using a python object that we were sending to it as a parameter.

The last task of our script was to install the terraform files generated and import them on the terraform state which means that the script had to perform terraform init and terraform import commands behind the scenes, in order to accomplish that goal, we used the python-terraform library that provides a wrapper of terraform command line tool.

Figure 4: The import script tasks

Once the script definition has been completed and validated with an example group of repositories, we went ahead and executed it to import all the repository's organization. We found that the time to complete the process of generating the terraform files and Importing them to the terraform state takes around 30 mins. This could be interpreted as a lot of time but this behavior only occurred the first time we imported all the resources. After incorporating the new repositories and making subsequent changes, the time required for executing the terraform plan and terraform apply commands was reduced in half, leading to significant time savings.

Pipeline to automate the deployment of GitHub organization resources

Implementing an automated pipeline to validate and deploy GitHub resource changes is a practical way to reduce manual errors and make the deployment process more efficient, consistent, and reliable. By automating these tasks, organizations can improve their overall development process, increase developer productivity, and reduce the risk of generating undesired changes during deployments. Our goal has always been to encourage the adoption of IaC, allowing any member of the organization to create or update GitHub resources through pull-request (PR). By implementing an automated pipeline, we were able to boost that adoption and make the process even more streamlined and efficient.

In our context, we had been using the circle-ci pipeline vendor for CI/CD of the services and infrastructure resources so we used the same vendor to create a pipeline that manages the GitHub organization resources.

Figure 5: Repository pipeline

The process put in place for making changes to the GitHub organization resources was composed of 3 steps. For the first step, the user should make the desired changes over the terraform files that represent the GitHub resources. After that, the user has to create a PR to merge their git branch with the main branch. When a PR is created a pipeline is triggered and performs terraform plan command to validate that the changes made will not break anything. The PR will be ready to be merged if it has at least 1 approval and if the terraform plan command returns successful status. After the PR was merged, the last step was to make a manual approval on the pipeline workflow to deploy the changes.

We implemented a manual approval since there is no test or stage environment for GitHub. All the changes are being shipped directly to the only environment provided by GitHub that could be considered the production environment. So, with this approach we think we can reduce undesired changes or break Github organization resources.

Figure 6: Process for making changes to github resources

The organization standardization for managing GitHub resources with IaC.

When we started our journey to manage GitHub resources with IaC everyone within the organization was able to create/update/delete repositories. Also, a few members with admin roles were able to provide permission to users or teams over repositories. In other words, there was no standard to manage the GitHub resources. Upon completing the migration of all the GitHub resources to terraform files, we requested changes to manage it in a standardized and transparent way with IaC.

Before applying the changes to the resources management we shared our journey of the migration process and the supported use cases in an internal organization session called: “Tech Team Demo”. It’s a space where the internal tech teams share new features delivered, challenges and benefits of it. For us, it was a good space to empathize with the tech teams and share the new way to manage GitHub resources.

After our tech team demo presentation, we wanted to standardize the permissions over the repositories by offering exclusive access to teams and not to specific users. To accomplish that goal, we shared multiple communications with the tech leads requesting the names of the repositories that their teams need access to and what permissions they need over them. After a few weeks, we configured the teams with the required permissions that the tech leads shared with us and removed permissions to users. With those changes, we were able to standardize permissions over the repositories.

The other part we wanted to standardize was the process to add or remove a user within the GitHub organization and as a member of a GitHub team by using IaC. We shared the new process with the owners of this task and showed them the steps they needed to follow to accomplish it. In the firsts weeks, when they started using the new process, we got some clarification requests but over time they were able to do it on their own and of course, we celebrated it because with that we were able to standardize GitHub organization users and teams roll-up/roll-out.

With the two new standards implemented, we requested to remove permissions to create/update/remove repositories, teams, and users, so any change has to be done just by using IaC. On the first weeks of the permission removal, we got tons of access requests since multiple users had direct access to repositories or they were not part of the correct team, so for a few weeks we were busy configuring the right permissions and teams but after that everything was transparent to the organization tech members, in the way that currently they know part of which teams they belong to and the repositories they own.

Findings & next steps

After implementing new standards to manage the GitHub resources, we were able to find some areas that we could improve. One of them was related to the repositories, because of the name or description used, we found that a few repositories were created for PoC purposes and the owner forgot to remove them. Also, we found that a few repositories hadn’t been updated in the last year, so nobody was working on them. Based on those findings, we could archive the identified repositories and request confirmation from the relevant tech teams regarding the need for maintenance. If no maintenance is required, we could proceed to remove them.

On teams we found that some teams were composed of one or two members, also these members are part of multiple teams which means that we could refactor the team member's composition on the GitHub organization. If we want to go one step ahead we could also configure the teams according to the identity provider groups (Okta, Auth0, OneLogin, etc) used in the organization.

Since we manage hundred of GitHub resources with IaC the pipeline could be slow when executing terraform plan and terraform apply commands so we could find better mechanics to improve the performance by grouping the repositories with the most used, the less used or archived and configuring another pipeline to manage it separately.

Sometimes in critical periods related to sales, the business needs to apply code freeze to the entire organization which means that nobody can merge PR to the main branch or deploy changes to the production environment. It would be nice to take advantage of the repository configurations that we have available to enable and disable that feature when it’s required by the business.

On the pipeline, as a priority, it performs terraform plan command and then we wait for manual approval to deploy the changes. Sometimes we forgot to press the approval on the pipeline because we started with another activity. It would be nice to implement a notification that is sent to the team group chat, notifying them that the approval on the pipeline is waiting to be pressed.

Final thoughts

This journey of adopting IaC to manage GitHub organization resources has provided us with multiple lessons. Among them, is the development of the ability to iterate constantly. Specifically, when we were designing our modules for the repository and teams, we found ourselves with different configurations, and to cover all Github resources we had to modify our modules several times.

Another important lesson was the communication and evangelization to other technical teams that maybe at the beginning were resisting the new way to modify GitHub resources. Nowadays, they’ve adopted the proposed standards because of the transparency and control that it also provides them.

I hope this post can guide you or provide you with insights if you are thinking about managing your GitHub organization resources with IaC.