Google Online Security Blog

HTTPS security for web applications

June 16, 2009

Posted by Alma Whitten, Software Engineer, Security & Privacy Teams

A group of privacy and security experts sent a letter today urging Google to strengthen its leadership role in web application security, and we wanted to offer some of our thoughts on the subject.

We've long advocated for — and demonstrated — a focus on strong security in web applications. We run our own business on Google Apps, and we strive to provide a high level of security to our users. We currently let people access a number of our applications — including Gmail, Google Docs, and Google Calendar, among others — via HTTPS, a protocol that establishes a secure connection between your browser and our servers.

Let's take a closer look at how this works in the case of Gmail. We know that tens of millions of Gmail users rely on it to manage their lives every day, and we have offered HTTPS access as an option in Gmail from the day we launched. If you choose to use HTTPS in Gmail, our systems are designed to maintain it throughout the email session — not just at login — so everything you do can be passed through a more secure connection. Last summer we made it even easier by letting Gmail users opt in to always use HTTPS every time they log in (no need to type or bookmark the "https").

Free, always-on HTTPS is pretty unusual in the email business, particularly for a free email service, but we see it as an another way to make the web safer and more useful. It's something we'd like to see all major webmail services provide.

In fact, we're currently looking into whether it would make sense to turn on HTTPS as the default for all Gmail users.

We know HTTPS is a good experience for many power users who've already turned it on as their default setting. And in this case, the additional cost of offering HTTPS isn't holding us back. But we want to more completely understand the impact on people's experience, analyze the data, and make sure there are no negative effects. Ideally we'd like this to be on by default for all connections, and we're investigating the trade-offs, since there are some downsides to HTTPS — in some cases it makes certain actions slower.

We're planning a trial in which we'll move small samples of different types of Gmail users to HTTPS to see what their experience is, and whether it affects the performance of their email. Does it load fast enough? Is it responsive enough? Are there particular regions, or networks, or computer setups that do particularly poorly on HTTPS?

Unless there are negative effects on the user experience or it's otherwise impractical, we intend to turn on HTTPS by default more broadly, hopefully for all Gmail users. We're also considering how to make this work best for other apps including Google Docs and Google Calendar (we offer free HTTPS for those apps as well).

Stay tuned, but we wanted to share our thinking on this, and to let you know we're always looking at ways to make the web more secure and more useful.

Update @ 1:00pm: We've had some more time to go through the report. There's a factual inaccuracy we wanted to point out: a cookie from Docs or Calendar doesn't give access to a Gmail session. The master authentication cookie is always sent over HTTPS — whether or not the user specified HTTPS-only for their Gmail account. But we can all agree on the benefits of HTTPS, and we're glad that the report recognizes our leadership role in this area. As the report itself points out, "Users of Microsoft Hotmail, Yahoo Mail, Facebook and MySpace are also vulnerable to [data theft and account hijacking]. Worst of all — these firms do not offer their customers any form of protection. Google at least offers its tech savvy customers a strong degree of protection from snooping attacks." We take security very seriously, and we're proud of our record of providing security for free web apps.

Update on June 26th: We've sent a response to the signatories of the letter. You can read it here.

Top 10 Malware Sites

June 3, 2009

Posted by Niels Provos, Security Team

A recent surge in compromised web servers has generated many interesting discussions in online forums and blogs. We thought we would join the conversation by sharing what we found to be the most popular malware sites in the last two months.

As we've discussed previously, we constantly scan our index for potentially dangerous sites. Our automated systems found more than 4,000 different sites that appeared to be set up for distributing malware by massively compromising popular web sites. Of these domains more than 1,400 were hosted in the .cn TLD. Several contained plays on the name of Google such as goooogleadsence.biz, etc.

The graph shows the top-10 malware sites as counted by the number of compromised web sites that referenced it. All domains on the top-10 list are suspected to have compromised more than 10,000 web sites on the Internet. The graph also contains arrows indicating when these domains where first listed via the Safe Browsing API and flagged in our search results as potentially dangerous.

Other malware researchers reported widespread compromises pointing to the domains gumblar.cn and martuz.cn, both of which made it on our top-10 list. For gumblar, we saw about 60,000 compromised sites; Martuz peaked at slightly over 35,000 sites. Beladen.net was also reported to be part of a mass compromise, but made it only to position 124 on the list with about 3,500 compromised sites.

To help make the Internet a safer place, our Safe Browsing API is freely available and is being used by browsers such as Firefox and Chrome to protect users on the web.

Reducing XSS by way of Automatic Context-Aware Escaping in Template Systems

March 31, 2009

Posted by Jad S. Boutros, Security Team

Building on our earlier posts on defenses against web application flaws ["Automating Web Application Security Testing", "Meet ratproxy, our passive web security assessment tool"], we introduce Automatic Context-Aware Escaping (Auto-Escape for short), a functionality we added to two Google-developed general purpose template systems to better protect against Cross-Site Scripting (XSS).

We developed Auto-Escape specifically for general purpose template systems; that is, template systems that are for the most part unaware of the structure and programming language of the content on which they operate. These template systems typically provide minimal support for web applications, possibly limited to basic escaping functions that a developer can invoke to help escape unsafe content being returned in web responses. Our observation has been that web applications of substantial size and complexity using these template systems have an increased risk of introducing XSS flaws. To see why this is the case, consider the simplified template below in which double curly brackets {{ and }} enclose placeholders (variables) that are replaced with run-time content, presumed unsafe.


<body>
  <span style="color:{{USER_COLOR}};">
    Hello {{USERNAME}}, view your <a href="{{USER_ACCOUNT_URL}}">Account</a>.
  </span>
  <script>
    var id = {{USER_ID}}; // some code using id, say:
    // alert("Your user ID is: " + id);
  </script>
</body>

In this template, four variables are used (not in this order):

USER_NAME is inserted into regular HTML text and hence can be escaped safely by HTML-escape.
USER_ACCOUNT_URL is inserted into an HTML attribute that expects a URL and therefore in addition to HTML-escape, also requires validation that the URL scheme is safe. By allowing only a safe white-list of schemes, we can prevent (say) javascript: pseudo-URLs, which HTML-escape alone does not prevent.
USER_COLOR is inserted into a Cascading Style Sheets (CSS) context and therefore requires an escaping that also prevents scripting and other dangerous constructs in CSS such as those possible in expression() or url(). For more information on concerns with harmful content in CSS, refer to the CSS section of the Browser Security Handbook.
USER_ID is inserted into a Javascript variable that expects a number as it is not enclosed in quotes. As such, it requires an escaping that coerces it to a number (which a typical Javascript-escape function does not do), otherwise it can lead to arbitrary javascript execution. More variants may be developed to coerce content to other data types, including arrays and objects.

Each of these variable insertions requires a different escaping method or risks introducing XSS. To keep the example small, we excluded several contexts of interest, particularly style tags, HTML attributes that expect Javascript (such as onmouseover), and considerations of whether attribute values are enclosed within quotes or not (which also affects escaping).

Auto-Escape

The example above demonstrates the importance of understanding the precise context in which variables are being inserted and the need for escaping functions that are both safe and correct for each. For larger and complex web applications, we notice two related vectors for XSS:

A developer forgetting to apply escaping to a given variable.
A developer applying the wrong escaping for that variable for the context in which it is being inserted.

Considering the sheer number of templates in large web applications and the number of untrusted content they may operate on, the process of proper escaping becomes complicated and error prone. It is also difficult to efficiently audit from a security testing perspective. We developed Auto-Escape to take that complexity away from the developer and into the template system and therefore reduce the risks of XSS that would have ensued.

A Look at Implementation

Auto-Escape is a functionality designed to make the Template System web application context-aware and therefore able to apply automatically and properly the escaping required. This is achieved in three parts:

We determined all the different contexts in which untrusted content may be returned and provided proper escaping functions for each. This is part science and part practical. For example, we did not find the need to support variable insertion inside an HTML tag name itself (as opposed to HTML attributes) so we did not build support for it. Other factors come into play, including availability of existing escaping functions and backwards compatibility. As a result, part of that work is template system dependent.

We developed our own parser to parse HTML and Javascript templates. It provides methods which can be queried at a point of interest to obtain the context information necessary for proper escaping. The parser is designed with performance in mind, and it runs in a stream mode without look-ahead. It aims for simplicity while understanding that browsers may be more lenient than specifications, particularly in certain corner cases.

We added an extra step into the parsing that the template system already performs to locate variables, among other needs. This extra step activates our HTML/Javascript parser, queries it for the context of each variable then applies its escaping rules to compute the proper escaping functions to use for each variable. Depending on the template system, this step may be performed only the first time a template is used or for each web response in which case some limitations may be lifted.

A simple mechanism is provided for the developer to indicate that some variables are safe and should not be escaped. This is used for variables that are either escaped through other means in source code or contain trusted markup that should be emitted intact.

Current Status

Auto-Escape has been released with the C++ Google Ctemplate for a while now and it continues to develop there. You can read more about it in the Guide to using Auto-Escape. We also implemented Auto-Escape for the ClearSilver template system and expect it to be released in the near future. Lastly, we are in the process of integrating it into other template systems developed at Google for Java and Python and are interested in working with a few other open source template systems that may benefit from this logic. Our HTML/Javascript parser is already available with the Google Ctemplate distribution and is expected to be released as a stand-alone open source project very soon.

Co-developers: Filipe Almeida and Mugdha Bendre

Why Googlers attend the Internet Identity Workshop

March 26, 2009

Posted by Eric Sachs, Senior Product Manager, Google Security

Google’s participation in the Internet Identity Workshop (IIW) has grown from a few lone individuals at its founding in 2005 to fifteen Googlers at the last IIW. The reason for this growth is that as Google has started to provide more APIs and developer tools for our application hosting business, we have found that standards and interoperability for identity and security on the Internet are critical. Our engineers attend to discuss standards such as OAuth, OpenSocial, OAuth, SAML, Portable Contacts, as well as longer term trends around discovery, malware, phishing, and stronger authentication. Another major topic is the usability of these technologies, which we summarized in a blog post after the last IIW.

We hope that other companies and individuals working in these areas will register to attend IIW 2009a and start building momentum for another great event. If you attended either the Facebook hosted UX summit in Feb 2009 or the Yahoo hosted UX summit in Oct 2008, you can join in further discussions on those topics at the upcoming IIW.

Google attendees: Dirk Balfanz, Nathan Beach, Breno de Medeiros, Cassie Doll, Brian Eaton, Ben Laurie, Kevin Marks, John Panzer, Eric Sachs, and more to come

Announcing "Browser Security Handbook"

December 10, 2008

Posted by Michael Zalewski, Security Team.

Many people view the task of writing secure web applications as a very complex challenge - in part because of the inherent shortcomings of technologies such as HTTP, HTML, or Javascript, and in part because of the subtle differences and unexpected interactions between various browser security mechanisms.

Through the years, we found that having a full understanding of browser-specific quirks is critical to making sound security design decisions in modern Web 2.0 applications. For example, the same user-supplied link may appear to one browser as a harmless relative address, while another could interpret it as a potentially malicious Javascript payload. In another case, an application may rely on a particular HTTP request that is impossible to spoof from within the browser in order to defend the security of its users. However, an attacker might easily subvert the safeguard by crafting the same request from within commonly installed browser extensions. If not accounted for, these differences can lead to trouble.

In hopes of helping to make the Web a safer place, we decided to release our Browser Security Handbook to the general public. This 60-page document provides a comprehensive comparison of a broad set of security features and characteristics in commonly used browsers, along with (hopefully) useful commentary and implementation tips for application developers who need to rely on these mechanisms, as well as engineering teams working on future browser-side security enhancements.

Please note that given the sheer number of characteristics covered, we expect some kinks in the initial version of the handbook; feedback from browser vendors and security researchers is greatly appreciated.

Native Client: A Technology for Running Native Code on the Web

December 8, 2008

Posted by Brad Chen, Native Client Team.

Most native applications can access everything on your computer – including your files. This access means that you have to make decisions about which apps you trust enough to install, because a malicious or buggy application might harm your machine. Here at Google we believe you shouldn't have to choose between powerful applications and security. That's why we're working on Native Client, a technology that seeks to give Web developers the opportunity to make safer and more dynamic applications that can run on any OS and any browser. Today, we're sharing our technology with the research and security communities for their feedback to help make this technology more useful and more secure.

Our approach is built around a software containment system called the inner-sandbox that is designed to prevent unintended interactions between a native code module and the host system. The inner-sandbox uses static analysis to detect security defects in untrusted x86 code. Previously, such analysis has been challenging due to such practices as self-modifying code and overlapping instructions. In our work, we disallow such practices through a set of alignment and structural rules that, when observed, enable the native code module to be disassembled reliably and all reachable instructions to be identified during disassembly. With reliable disassembly as a tool, it's then feasible for the validator to determine whether the executable includes unsafe x86 instructions. For example, the validator can determine whether the executable includes instructions that directly invoke the operating system that could read or write files or subvert the containment system itself.

To learn more and help test Native Client, check out our post on the Google Code blog as well as our developer site. Our developer site includes our research paper and of course the source for the project under the BSD license.

We look forward to hearing what you think!

User Experience in the Identity Community

December 2, 2008

Eric Sachs & Ben Laurie, Google Security

One of the major conferences on Internet identity standards is the Internet Identity Workshop (IIW), a semiannual 'un-conference' where the sessions are not determined ahead of time. It is attended by a large set of people who work on Internet security and identity standards such as OAuth, OpenID, SAML, InfoCards, etc. A major theme within the identity community this year has been about improving the user experience and growing the adoption of these technologies. The OpenID community is making great progress on user experience, with Yahoo, AOL, and Google quickly improving the support they provide (read a summary from Joseph Smarr of Plaxo). Similarly, the InfoCard community has been working on simplifying the user experience of InfoCard technology, including the updated CardSpace selector from Microsoft.

Another hot topic at IIW centered around how to improve the user experience when testing alternatives and enhancements to passwords to make them less susceptible to phishing attacks. Many websites and enterprises have tried these password enhancements/alternatives, but they found that people complained that they were hard to use, or that they weren't portable enough for people who use multiple computers, including web cafes and smart phones. We have published an article summarizing some of the community's current ideas for how to deploy these new authentication mechanisms using a multi-layered approach that minimizes additional work required by users. We have also pulled together a set of videos showing how a number of these different approaches work with both web-based and desktop applications. We hope this information will be helpful to other websites and enterprises who are concerned about phishing.

Gmail security and recent phishing activity

November 25, 2008

Posted by Chris Evans

We've seen some speculation recently about a purported security vulnerability in Gmail and the theft of several website owners' domains by unauthorized third parties. At Google we're committed to providing secure products, and we mounted an immediate investigation. Our results indicate no evidence of a Gmail vulnerability.

With help from affected users, we determined that the cause was a phishing scheme, a common method used by malicious actors to trick people into sharing their sensitive information. Attackers sent customized e-mails encouraging web domain owners to visit fraudulent websites such as "google-hosts.com" that they set up purely to harvest usernames and passwords. These fake sites had no affiliation with Google, and the ones we've seen are now offline. Once attackers gained the user credentials, they were free to modify the affected accounts as they desired. In this case, the attacker set up mail filters specifically designed to forward messages from web domain providers.

Several news stories referenced a domain theft from December 2007 that was incorrectly linked to a Gmail CSRF vulnerability. We did have a Gmail CSRF bug reported to us in September 2007 that we fixed worldwide within 24 hours of private disclosure of the bug details. Neither this bug nor any other Gmail bug was involved in the December 2007 domain theft.

We recognize how many people depend on Gmail, and we strive to make it as secure as possible. At this time, we'd like to thank the wider security community for working with us to achieve this goal. We're always looking at new ways to enhance Gmail security. For example, we recently gave users the option to always run their entire session using https.

To keep your Google account secure online, we recommend you only ever enter your Gmail sign-in credentials to web addresses starting with https://www.google.com/accounts, and never click-through any warnings your browser may raise about certificates. For more information on how to stay safe from phishing attacks, see our blog post here.

OAuth for Secure Mashups

November 18, 2008

Posted by Eric Sachs, Senior Product Manager, Google Security

A year ago, a number of large and small websites announced a new open standard called OAuth. This standard is designed to provide a secure and privacy-preserving technique for enabling specific private data on one site to be accessed by another site. One popular reason for that type of cross-site access is data portability in areas such as personal health records (such as Google Health or Microsoft Healthvault), as well as social networks (such as OpenSocial enabled sites). I originally became involved in this space in the summer of 2005, when Google started developing a feature called AuthSub, which was one of the pre-cursors of OAuth. That was a proprietary protocol, but one that has been used by hundreds of websites to provide add-on services to Google Account users by getting permission from users to access data in their Google Accounts. In fact, that was the key feature that a few of us used to start the Google Health portability effort back when it was only a prototype project with a few dedicated Googlers.

However, with the development of a common Internet standard in OAuth, we see much greater potential for data portability and secure mash-ups. Today we announced that the gadget platform now supports OAuth, and the interoperability of this standard was demonstrated by new iGoogle gadgets that AOL and MySpace both built to enable users to see their respective AOL or MySpace mailboxes (and other information) while on iGoogle. However, to ensure the user's privacy, this only works after the user has authorized AOL or MySpace to make their data available to the gadget running on iGoogle. We also previously announced that third-party developers can build their own iGoogle gadgets that access the OAuth-enabled APIs for Google applications such as Calendar, Picasa, and Docs. In fact, since both the gadget platform and OAuth technology are open standards, we are working to help other companies who run services similar to iGoogle to enhance them with support for these standards. Once that is in place, these new OAuth-powered gadgets that are available on iGoogle will also work on those other sites, including many of the gadgets that Google offers for its own applications. This provides a platform for some interesting mash-ups. For example, a third-party developer could create a single gadget that uses OAuth to access both Google OAuth-enabled APIs (such as a Gmail user's address book) and MySpace OAuth-enabled APIs (such as a user's friend list) and display a mashup of the combination.

While the combination of OAuth with gadgets is an exciting new use of the technology, most of the use of OAuth is between websites, such as to enable a user of Google Health to allow a clinical trial matching site to access his or her health profile. I previously mentioned that one privacy control provided by OAuth is that it defines a standard way for users to authorize one website to make their data accessible to another website. In addition, OAuth provides a way to do this without the first site needing to reveal the identity of the user -- it simply provides a different opaque security token to each additional website the user wants to share his or her data with. It would allow a mutual fund, for example, to provide an iGoogle gadget to their customers that would run on iGoogle and show the user the value of his or her mutual fund, but without giving Google any unique information about the user, such as a social security number or account number. In the future, maybe we will even see industries like banks use standards such as OAuth to allow their customers to authorize utility companies to perform direct debit from the user's bank account without that person having to actually share his or her bank account number with the utility vendor.

The OAuth community is continuing to enhance this standard and is very interested in having more companies engaged with its development. The OAuth.net website has more details about the current standard, and I maintain a website with advanced information about Google's use of OAuth, including work on integrating OAuth with desktop apps, and integrating with federation standards such as OpenID and SAML. If you're interested in engaging with the OAuth community, please get in touch with us.

Malware? We don't need no stinking malware!

October 24, 2008

Written by Oliver Fisher

"This site may harm your computer"
You may have seen those words in Google search results — but what do they mean? If you click the search result link you get another warning page instead of the website you were expecting. But if the web page was your grandmother's baking blog, you're still confused. Surely your grandmother hasn't been secretly honing her l33t computer hacking skills at night school. Google must have made a mistake and your grandmother's web page is just fine...

I work with the team that helps put the warning in Google's search results, so let me try to explain. The good news is that your grandmother is still kind and loves turtles. She isn't trying to start a botnet or steal credit card numbers. The bad news is that her website or the server that it runs on probably has a security vulnerability, most likely from some out-of-date software. That vulnerability has been exploited and malicious code has been added to your grandmother's website. It's most likely an invisible script or iframe that pulls content from another website that tries to attack any computer that views the page. If the attack succeeds, then viruses, spyware, key loggers, botnets, and other nasty stuff will get installed.

If you see the warning on a site in Google's search results, it's a good idea to pay attention to it. Google has automatic scanners that are constantly looking for these sorts of web pages. I help build the scanners and continue to be surprised by how accurate they are. There is almost certainly something wrong with the website even if it is run by someone you trust. The automatic scanners make unbiased decisions based on the malicious content of the pages, not the reputation of the webmaster.

Servers are just like your home computer and need constant updating. There are lots of tools that make building a website easy, but each one adds some risk of being exploited. Even if you're diligent and keep all your website components updated, your web host may not be. They control your website's server and may not have installed the most recent OS patches. And it's not just innocent grandmothers that this happens to. There have been warnings on the websites of banks, sports teams, and corporate and government websites.

Uh-oh... I need help!
Now that we understand what the malware label means in search results, what do you do if you're a webmaster and Google's scanners have found malware on your site?

There are some resources to help clean things up. The Google Webmaster Central blog has some tips and a quick security checklist for webmasters. Stopbadware.org has great information, and their forums have a number of helpful and knowledgeable volunteers who may be able to help (sometimes I'm one of them). You can also use the Google SafeBrowsing diagnostics page for your site (http://www.google.com/safebrowsing/diagnostic?site=<site-name-here>) to see specific information about what Google's automatic scanners have found. If your site has been flagged, Google's Webmaster Tools lists some of the URLs that were scanned and found to be infected.

Once you've cleaned up your website, use Google's Webmaster Tools to request a malware review. The automatic systems will rescan your website and the warning will be removed if the malware is gone.

Advance warning
I often hear webmasters asking Google for advance warning before a malware label is put on their website. When the label is applied, Google usually emails the website owners and then posts a warning in Google's Webmaster Tools. But no warning is given ahead of time - before the label is applied - so a webmaster can't quickly clean up the site before a warning is applied.

But, look at the situation from the user's point of view. As a user, I'd be pretty annoyed if Google sent me to a site it knew was dangerous. Even a short delay would expose some users to that risk, and it doesn't seem justified. I know it's frustrating for a webmaster to see a malware label on their website. But, ultimately, protecting users against malware makes the internet a safer place and everyone benefits, both webmasters and users.

Google's Webmaster Tools has started a test to provide warnings to webmasters that their server software may be vulnerable. Responding to that warning and updating server software can prevent your website from being compromised with malware. The best way to avoid a malware label is to never have any malware on the site!

Reviews
You can request a review via Google's Webmaster Tools and you can see the status of the review there. If you think the review is taking too long, make sure to check the status. Finding all the malware on a site is difficult and the automated scanners are far more accurate than humans. The scanners may have found something you've missed and the review may have failed. If your site has a malware label, Google's Webmaster Tools will also list some sample URLs that have problems. This is not a full list of all of the problem URLs (because that's often very, very long), but it should get you started.

Finally, don't confuse a malware review with a request for reconsideration. If Google's automated scanners find malware on your website, the site will usually not be removed from search results. There is also a different process that removes spammy websites from Google search results. If that's happened and you disagree with Google, you should submit a reconsideration request. But if your site has a malware label, a reconsideration request won't do any good — for malware you need to file a malware review from the Overview page.

How long will a review take?
Webmasters are eager to have a Google malware label removed from their site and often ask how long a review of the site will take. Both the original scanning and the review process are fully automated. The systems analyze large portions of the internet, which is big place, so the review may not happen immediately. Ideally, the label will be removed within a few hours. At its longest, the process should take a day or so.

Security Blog