By Jason Ritzke – Systems Engineer
…or how I stubbed my toe on the keys to a few dozen kingdoms
I’m authoring this security write up in ReStructuredText. An odd way to start this off, I know. But bear with me, I promise that ReStructuredText is relevant to a recently patched major vulnerability I uncovered in GitLab. Hopefully, that’s a strange enough sentence to capture your attention. If not, I promise cookies 35% of the way through this document.
ReStructuredText is my favorite lightweight plaintext markup language, and I tend to prefer it over the much more popular Markdown on the basis of its greater standardization and the greater power of its base feature set. In addition, it’s also the primary documentation parsing system for the Python community, making it an easy choice to learn for Python coders such as myself.
Because of this, I have often used the Pelican static site generator to build websites. It is a tool that takes ReStructuredText as input and outputs static HTML websites. These websites are fast, secure, and simple to administer. I turned to Pelican again when creating a site for my wife’s new webcomic, herpaderp.party. It was during the process of creating this site that I stumbled across something that I can only reasonably describe as “kind of a big deal.”
In the process of creating the feeds page for herpaderp.party, I found it necessary to have recourse to the ReStructuredText raw directive in order to force raw HTML into a page. The pre-parsed code looks like this:
.. raw:: HTML
<form method=’post’ action=’https://blogtrottr.com’> Your email:
<input type=’text’ name=’btr_email’ />
<input type=’hidden’ name=’btr_url’ value=’https://herpaderp.party/feeds/strips.a
<input type=’hidden’ name=’schedule_type’ value=’6′ />
<input type=’submit’ value=’Follow this feed via email’ />
This produces the output that you see on the above page and is a relatively clean solution to the problem that I was presented with. What shocked me, however, was the fact that after I had committed the code and pushed it to GitLab the raw HTML code I had inserted was visible, not as code, but as the button that the code itself defines.
All of the major web-based SCM platforms (GitHub, Bitbucket, GitLab, and Phabricator) support previewing the rendered text of a number of popular markup formats (such as ReStructuredText, Markdown, AsciiDoc, etc.). At this point, it’s very near to being a required feature in the space. It’s this feature that enables the richly formatted README files that people have come to expect from the web frontends of their SCM solutions. However, with the ability to insert raw HTML into the output comes the potential capability to rewrite the website itself as it’s presented to the user. This was when my concern started to bubble to the surface.
While Pelican rendering my raw directive is totally acceptable (it’s my site with my content parsed by me), for GitLab to render this HTML is another matter entirely. As a general rule, a website should never allow user-supplied HTML to run on the site, especially without sanitization. The reasons for this were well exposed by the Samy worm, but this write up may serve as a refresher on the dangers.
alert(‘This is an alert to test if this site may be pwnable’);
Cookies are delicious, except when you can’t eat them…
In general, GitLab has done a good job of ensuring that their application is secure. This is good because I regularly recommend it to clients and friends (and will continue to do so) and use it almost exclusively to host my code. I’m a fan. As a result of them having done a good job, they’ve made it so that you cannot just “grab” the user’s session cookie and send it away to a remote location.
If we had been able to perform that operation, it would have been trivial to fully impersonate every single user that viewed the README file of the project. This is as dangerous as a standard phishing attack, however from a user perspective it has none of the giveaways. It comes from a trusted URL and displays no popups asking for permission or access. So we’re very lucky that such an attack is not permitted. Unfortunately, however, when you are executing on the browser with code originating from the website that you want to perform actions on, this protection becomes immaterial.
AJAX the great
AJAX is a web subsystem that powers the vast majority of modern websites. Fundamentally, AJAX is what allows pages to fill and populate content without requiring the user to refresh the page. Your Facebook and Twitter feed, Google Maps, practically every popular website makes use of AJAX extensively.
Anatomy of a POC
The POC exploit first makes an HTTP GET request for the user’s dashboard page and indexes the entire list of projects to which they have access. It then accesses the user access management page for each of those projects (more GET requests) and retrieves a security token from the user management forms on those pages. This token is required for the final step and is designed to help mitigate against CSRF. Unfortunately, in this case, our attack is not cross-site at all, and therefore we can simply ask for this token (just as the user’s normal application usage would).
The final and most critical step of a POC is the actual exploit, the “what” that occurs after the “how”. In this case, the final step is to make an HTTP POST request against the API endpoint for user management to add a new user to the project with full permissions:
The simple attack…extended
The ability to execute drive-by requests on behalf of a user is dangerous on a fundamental level, but on a platform like GitLab, the problem expands because it becomes possible to make the exploit self-replicating. In short, it can be extended into a worm.
If the user who is given master access to every project then adds the exploit code to the README of those projects, as more users view them the exploit code can be added to more projects. Dependent on the viewing rate it would rapidly become resident in the majority of the projects on the platform. This entire process could be automated, of course. Additionally, it’s important to remember that access to private projects would be possible in this way, so all repositories that users might have considered “secure” would suddenly be exposed. Secret keys, credentials, and proprietary code would be available to the attacker.
It’s therefore clear that even this small parsing vulnerability can, in the right circumstances, become a very serious security issue. But how did it happen in the first place? How did GitLab come to be vulnerable to this attack?
The bug is always upstream
It turns out that GitLab’s parsing of ReStructuredText documents is performed by a Ruby gemcalled gitlab-markup. This gem was forked from an open-source gem created by GitHub,github-markup. GitLab uses this code to run the actual Python program (docutils) which performs the rendering.
Github-markup performs this rendering via a wrapper script called rest2html. Rest2html explicitly allows raw directives, under code added in commit 68557d2 which flipped this switch from false to true (the commit message rather ironically reads “enable raw HTML”). I’ve already opened issue 981 with that project, but have not yet received a response as to why this was not considered a risk. From my rudimentary testing GitHub appears to block raw:: HTML entirely, so it’s possible this may no longer be the code that they run on their systems. Or perhaps they perform some additional sanitization later on.
What about everyone else?
Although GitHub may be protecting itself in a satisfactory manner, still of concern are the vast number of downstream projects that use this code. A full 708 repositories on GitHub alone are listed as dependent, but there may be many others not hosted on GitHub or who use this upstream module in a way not caught by GitHub’s dependency graph.
For all of these applications, it is vital that their authors consider carefully whether they are using this code in a safe manner. Is it rendering user-supplied content back to other users, and if so is that output being sanitized before use? My recommendation would be to heed the warnings of the authors of docutils and disable raw entirely from running in any context on user-supplied data. It is, by the admission of its own documentation, a security risk. Heck, if you’re foolish enough to enable raw file or URL access you can actually access files and URLs from the system on which the parser is running. Terrifying.
GitLab is to be lauded for their incredibly professional and speedy response. They were back to me in less than 24 hours after initial disclosure and had a provisional patch less than 24 hours after that. The vulnerability was publically announced on 2017/01/10 with a patch that disables the raw:: HTML directive entirely.
It is a sad but inescapable truism that code will have bugs, and what’s important is that we find, fix, and learn to keep ourselves safer over time. speaking of that…
Lessons to learn
In inheriting code from an upstream source it is critical to understand that you also inherit that code’s perception of its own security model within its environment. Github-markup may be secure within the GitHub and GitHub Enterprise software stack, but is not guaranteed to be secure outside of it. The usage of a library by a well-known and generally “considered secure” vendor does not mean that you can incorporate it into your product without careful evaluation.
As we develop software we must not only inspect and test the code (both that which we write and that which we inherit), but also the application after all of the code has been assembled and installed. While it’s obviously infeasible to validate every line of code in an operating system just to write an app that runs atop it, there is no substitute for aggressive testing of your application once it has been stood up. Qualified professionals can help you discover the chinks in your armor before somebody nefarious starts holding your users for ransom and damaging your reputation.
- Taos: for giving me a good place to talk about this
- GitLab: for being so much more professional than most upon being informed of a flaw
- DC562 (My Long Beach Defcon Group): for helping me learn how to hack properly
- My wife: for putting up with my perpetual lack of sleep while I fix broken pieces of computer