Plain text considered harmful: A cross-domain exploit

referer:http://balpha.de/2013/02/plain-text-considered-harmful-a-cross-domain-exploit/

Data from around the world

The same origin policy prevents a website's JavaScript from seeing the result of a request made to a different domain. This is essential because that request would send along any cookies stored for that domain. If you happen to be authenticated on the other site, and visit a malicious site, then the evil page could request, say, your account balance summary from the other site.

Note that the same origin policy doesn't necessarily prevent the request per se – it just prevents the response from being accessible. A malicious site can e.g. just redirect your browser, or submit a form, or include an image or an iframe – in all those cases a request is made to your site; the evil site just doesn't see the response. It doesn't have access to the iframe's DOM, painting a cross-domain image to a canvas will taint it, and so on. In some cases, it's possible for a site to say that it's okay to see the data on a different domain, but this isn't the default.

Let's look at <script> elements now. If such an element's src attribute is set, then the browser will load the script and execute it in the current page. It will do this regardless of origin (otherwise, you e.g. couldn't serve jQuery from Google's CDN). It's not possible for a site to read the content of a script that was loaded from a different domain, but any side effects of executing the script are very visible – like the jQuery object suddenly existing in your page.

This cross-domain script loading is what JSONP utilizes. If you have, say, a public API where you're entirely okay with the data being accessible cross-domain, you can enable other sites' JavaScript to use your data by returning JSONP. This essentially means you're returning JavaScript (which, as explained above, will be executed despite the different origin). The consuming site lets you know that it has prepared a function for your script to call, and what that function's name is. You return a JavaScript file that calls this very function with your data.

function theCallback(data) {

    alert("The temperature is " + data.temperature_celsius + " °C");

}

var script = document.createElement("script");

script.src = "http://some-weather-service.com/api?city=Boston&callback=theCallback";

document.head.appendChild(script);

The callback parameter tells the weather service what function to call in its JSONP response, and thus the API's response looks like this:

theCallback({"temperature_celsius":20,"temperature_fahrenheit":68})

which makes your site work as intended.

As an aside: You should only use JSONP APIs from sites that you trust, since you're allowing those sites to send you arbitrary JavaScript that you will happily execute on your own domain.

My precious

So we've established that it's possible to execute JavaScript that comes from a different domain; a site that wants to allow cross-origin access can send valid JavaScript that any site can thus embed. Now let's look at the case where the site doesn't want to allow this; it only wants the data to be accessible to requests coming from its own site.

As far as simple reading of the request's response goes, same-origin takes care of this. But you obviously also want to prevent the above “execute as JavaScript” trick from working. The lowest-hanging fruit is just making sure that your response doesn't look like a function call. However, there is such a thing as a function call in disguise – Phil Haack wrote a blog post and a more detailed follow-up on this topic, which will tell you why you shouldn't return JSON responses whose outer wrapper is an array.

But there's an even more subtle vulnerability which appears when you don't return JSON at all. Here's a real-life example I found (it was fixed in the meantime). Assume your site uses a CSRF token, and you want to make sure that a browser tab always has the up-to-date token, even when it changes. This could be because you cycle the token at regular intervals or just because in a different browser tab your user logged out and logged in again. That's why you have a route that returns the current CSRF token for the active session, so the browser can regularly poll for it.

Same-origin policy prevents other sites from seeing this token in any way, since you neither wrap it in a function call, nor in an array – in fact, you wrap it in nothing at all; you just return the token as a plain text. Your token is a 24-digit hexadecimal number, like b2039487e51a6bfdc7299de0, and these 24 characters are all that your route returns. Nothing in there that can cause a function to be executed, so even if a malicious site included

<script src="http://your-awesome-site.com/current-csrf-token"></script>

in its HTML, it wouldn't have access to the token.

Right? Nope.

Not as hidden as you'd think

Let's see what happens when a browser executes a “script” that contains nothing more than b2039487e51a6bfdc7299de0. If you try it out, you'll see that not a lot happens, except that the JavaScript console will show an error:

ReferenceError: b2039487e51a6bfdc7299de0 is not defined

The first thing to note is that this is not a syntax error. Because b2039487e51a6bfdc7299de0 is a valid JavaScript identifier, this 24-character long “script” is syntactically valid JavaScript, and the browser happily runs it. So what happens when the JavaScript engine encounters an identifier? It will look for a variable in the current scope by this name. Since the script is executed in a global context, the current scope is the window object. The engine checks whether window["b2039487e51a6bfdc7299de0"] exists, which it obviously doesn't, and thus throws an exception.

This looks like something that the malicious site cannot gain any information from, but unfortunately in Firefox, it can. Firefox supports Proxy objects by default. Chrome also supports them, but only if you enable experimental JavaScript (and with a slightly different syntax than Firefox – point being: this exploit can be made to work in Chrome as well, but currently only for users who have the experimental JS features turned on).

Proxy objects are essentially property accessors on speed. If you're used to Python, think “overriding __getattr__.” If you're used to C#, think “TryGetMember on DynamicObject.” Together with prototype magic, they enable us hook into the “looking for a variable in the current scope” process. If the user visits the following page in Firefox, the JavaScript can identify the token:

<html>

    <head>

        <script type="text/javascript">

            window.onload = function () {

                window.__proto__ = new Proxy(window.__proto__, {

                    has: function (target, name) {

                        if (/^[a-f0-9]{24}$/.test(name))

                            alert("Your CSRF token on your-awesome-site.com is " + name);

                        return name in target;

                    }

                });

                var s = document.createElement("script");

                s.src = "http://your-awesome-site.com/current-csrf-token";

                document.head.appendChild(s);

            }

        </script>

    </head>

    <body></body>

</html>

– and an alert is just the least evil thing the site can do with the token.

Observant readers may have noticed that the above only works if the token starts with a letter, since only then is it a valid identifier. But for a random hexadecimal digit that's six out of sixteen, thus a 37.5% chance. That's not too shabby. And of course it depends on what kind of data your plain-text API returns.

If the return values are predictable, and the evil site only cares about a yes-or-no question, it doesn't even need proxy objects. For example, let's say a similar route returns the user name of the currently logged-in user, and all the malicious site wants to know is whether a visitor is logged in as “balpha” on your-awesome-site.com. For that, they would just have to create a property called "balpha" with a get accessor on the global object. These are supported by all major browsers.

What's to take away from this? If you respond with plain text data to a GET request, don't be too sure that the same origin policy will save you. Either wrap your data in a JSON object (not an array!), or only respond to POSTrequests with such data (unless you're a REST zealot enthusiast, in which case POST won't always be an option), or both. Other possible mitigations include the while(1) trick, or at the very least making sure that your secret payload is not syntactically valid JavaScript.