Security

GitHub secrets reveal API keys, usernames, passwords, and more exposed

Tuesday, March 30, 2021

GitGuardian’s constant monitoring of every single commit pushed to public GitHub, indicates an alarming growth of 20% year-over-year in the number of GitHub secrets found. Over two million secrets were detected on public GitHub in 2020. A growing volume of sensitive data or secrets like API keys, private keys, certificates, usernames, and passwords end up publicly exposed on GitHub, putting corporate security at risk as the vast majority of organizations are either ignoring the problem or poorly equipped to cope with it.

Commentary from Carole Winqwist CMO of GitGuardian

Over two million secrets have been detected on public GitHub in 2020 and this number is growing 20% Year-Over-Year, a GitGuardian State of Secrets Sprawl on GitHub Report shows.

This growing volume of sensitive data or secrets, like API keys, private keys, certificates, usernames and passwords end up publicly exposed on GitHub, putting corporate security at risk as the vast majority of organizations are either ignoring the problem or poorly equipped to cope with it.

One of the major findings of this report is that 85% of these secrets are found in developers’ personal repositories outside of corporate control. In fact, organizations often have no visibility on these public personal repositories, let alone the authority to enforce any kind of preventive security measures.

Companies need to tackle this issue and have the ability to scan not only public repositories but also private repositories to prevent lateral movements of malicious actors.

We caught up with CEO and co-founder Jeremy Thomas recently. He shares the complete results including the types of secrets that were found.

ADM: You are constantly monitoring every single commit pushed to public GitHub, what do you find?

Thomas: GitGuardian has been monitoring every single commit pushed to public GitHub since July 2017. In 2020, this is almost 1 billion public commits, about 2,5 million commits every day. Over 2 million secrets (API keys, usernames and passwords, or security certificates), 5,000 secrets a day were detected. And this number is growing 20% compared to last year.

ADM: What finding surprised you the most?

Thomas: When you ask a developer if hard coding credentials is a bad practice, he will of course confirm. This is why the first thing that is counterintuitive and surprising is the order of magnitude of the findings. We are talking about millions of secrets available in the public space. And this is only looking at public GitHub, the issue inflates even more if you consider organizations’ private repositories.

The second finding that is really making this exposure critical is that only 15% of leaks on GitHub occur within public repositories owned by organizations. 85% of the leaks occur on developers’ personal repositories. You could argue that this is not a major problem as these would be personal secrets, but unfortunately, secrets present in all these repositories can be either personal or corporate. And this is where the risk lies for organizations as some of their corporate secrets are exposed publicly through their current or former developer’s personal repositories.

ADM: What are these GitHub secrets exactly?

Thomas: A secret can be any sensitive data that we want to keep private. When discussing secrets in the context of software development, secrets generally refer to digital authentication credentials that grant access to services, systems, and data. These are most commonly API keys, usernames, and passwords, or security certificates. Secrets are what tie together different building blocks of a single application by creating a secure connection between each component. Secrets grant access to the most sensitive systems. They are the key to the kingdom in a sense.

ADM: Why would a developer leak a secret?

Thomas: Usually these leaks are unintentional and not malevolent. They happen because developers typically have one GitHub account that they use both for personal and professional purposes, sometimes mixing the repositories. Developers are also manipulating more and more secrets that they need to programmatically connect components of their applications. Technically speaking it is also easy to misconfigure git and push wrong data and it is easy to forget that the entire git history is still publicly visible even if sensitive data has since been deleted from the actual version of source code.

ADM: You say the number of secrets found is growing, why is that?

Thomas: We think the growth is due to 2 factors : the increase of GitHub usage and the growth due to the move towards cloud architectures and componentization. To compound the problem companies are pushing for shorter release cycles, developers have many technologies to master, and the complexity of enforcing good security practices increases with the size of the organization, the number of repositories, the number of developer teams and their geographical spread.

ADM: What are the risks linked to the exposure of these secrets?

Thomas: Storing secrets in public repositories is very dangerous because they are freely available to everyone on the internet. It is very easy to monitor public repositories, GitHub has a public API to fetch all public commits that can be used by malicious users. The risk is that secrets found on public GitHub can not only be potentially used to access sensitive data or systems, they could provide access to additional secrets and be used to move laterally to other systems. Data loss, infrastructure suppression, legitimate identity usage, to name a few are then possible.

ADM: Why do you think so many companies are ignoring the problem?

Thomas: Many companies put in place centralized secrets management systems, thinking that protecting their secrets in vaults would be sufficient. But this is not. These systems are typically not deployed on the whole perimeter and are not coercitive as they do not prevent developers from hardcoding credentials stored in the vault. The reality is most organizations are operating blind. Most leaks of organization’s credentials on public GitHub occur on developers’ personal repositories, where organizations often have no visibility, let alone the authority to enforce any kind of preventive security measures.

ADM: What should companies do to avoid the risk of secrets exposure?

Thomas: Solutions are available for them to automate secrets detection and put in place the proper remediation. Companies need to scan not only public repositories but also private repositories to prevent lateral movements of malicious actors after they gained initial access to a repository containing secrets.

Developers training programs should also be put in place although these do not eradicate the risk of leaked credentials.

Following best practices is not sufficient and companies need to secure the SDLC with automated secrets detection. Choosing a secrets detection solution they need to take into account:

Monitoring developers’ personal repositories capacities (for the public side)
Secrets detection performance* – Accuracy, precision & recall
Real-time alerting
Integration with remediation workflows
Easy collaboration between Developers, Threat Response and Ops teams.

ADM: Isn't GitHub offering it's own secrets detection solution?

Thomas: GitHub’s secret scanning capacities are included in the GitHub Advanced Security license. GitHub Advanced Security is an additional product in addition to a standard GitHub Enterprise license and it is a platform including multiple security features. The disadvantage to this platform approach is that you cannot pick specific security vendors which have more in depth coverage in their specific discipline. For example Snyk for dependency scanning and GitGuardian for secrets scanning. So the decision is between wanting the best possible coverage and dealing with multiple vendors or, dealing with a single vendor with basic capabilities.

ADM: Secret exposure is not just a public GitHub issue, what about private repositories?

Thomas: This is true and probably even more critical as when developers feel safer behind private repositories closed doors, they tend to choose the path of least resistance when handling secrets which may include hardcoding them into source code, distributing them through email or messaging systems, saving them directly into config files and storing them inside internal wikis. The danger of this may not immediately be apparent as all these systems still have some level of access control, but once secrets start to enter different systems companies lose both control over where their secrets end up and who has access and visibility over where their secrets are.

About Jérémy Thomas

Jérémy Thomas, co-founder and CEO of GitGuardian, is an engineer & an entrepreneur. He graduated from Ecole Centrale in Paris.He first worked in finance and then began his entrepreneurial journey by first founding Quantiops, a consulting company specializing in the analysis of large amounts of data, then GitGuardian in 2017. GitGuardian, a cybersecurity start-up co-founded with Eric Fourrier, has been pursuing a strong growth trajectory since 2017, supported by investors such as Balderton Capital, BPI France or Scott Chacon, co-founder of GitHub and Solomon Hykes, founder of Docker.