Official Google Webmaster Central Blog

Rich Snippets Instructional Videos

Friday, December 16, 2011

Webmaster level: All

When users come to Google, they have a pretty good idea of what they’re looking for, but they need help deciding which result might have the information that best suits their needs. So, the challenge for Google is to make it very clear to our users what content exists on a page in both a useful and concise manner. That’s why we have rich snippets.

Essentially, rich snippets provide you with the ability to help Google highlight aspects of your page. Whether your site contains information about products, recipes, events or apps, a few simple additions to your markup can result in more engagement with your content -- and potentially more traffic to your site.

To help you get started or fine tune your rich snippets, we’ve put together a series of tutorial videos for webmasters of all experience levels. These videos provide guidance as you mark up your site so that Google is better able to understand your content. We can use that content to power the rich snippets we display for your pages. Check out the videos below to get started:

For more information on how to use rich snippets markup for your site, visit our Help Center.

Posted by Alejandro Goyen, Product Manager

Introducing smartphone Googlebot-Mobile

Thursday, December 15, 2011

Webmaster level: All
With the number of smartphone users rapidly rising, we’re seeing more and more websites providing content specifically designed to be browsed on smartphones. Today we are happy to announce that Googlebot-Mobile now crawls with a smartphone user-agent in addition to its previous feature phone user-agents. This is to increase our coverage of smartphone content and to provide a better search experience for smartphone users.
Here are the main user-agent strings that Googlebot-Mobile now uses:

Feature phones Googlebot-Mobile:
- SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
- DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
Smartphone Googlebot-Mobile:
- Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
  Update October 2013: Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
  Update August 2015: Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12F70 Safari/600.1.4 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

The content crawled by smartphone Googlebot-Mobile will be used primarily to improve the user experience on mobile search. For example, the new crawler may discover content specifically optimized to be browsed on smartphones as well as smartphone-specific redirects.
One new feature we’re also launching that uses these signals is Skip Redirect for Smartphone-Optimized Pages. When we discover a URL in our search results that redirects smartphone users to another URL serving smartphone-optimized content, we change the link target shown in the search results to point directly to the final destination URL. This removes the extra latency the redirect introduces leading to a saving of 0.5-1 seconds on average when visiting landing page for such search results. Update 9 August 2013: If a site uses separate URLs to serve desktop and smartphone users, and if we discover accurate rel-alternate-media annotations linking the desktop and smartphone pages, our algorithms may change the link target shown in the search results to point directly to the smartphone page.
Since all Googlebot-Mobile user-agents identify themselves as a specific kind of mobile, please treat each Googlebot-Mobile request as you would a human user with the same phone user-agent. This, and other guidelines are described in our previous blog post and they still apply, except for those referring to smartphones which we are updating today. If your site has treated Googlebot-Mobile specially based on the fact that it only crawls with feature phone user-agents, we strongly recommend reviewing this policy and serving the appropriate content based on the Googlebot-Mobile’s user-agent, so that both your feature phone and smartphone content will be indexed properly.
If you have more questions, please ask on our Webmaster Help forums.
Posted by Yoshikiyo Kato, Software Engineer

Clicks and impressions for authors

Wednesday, December 14, 2011

Webmaster Level: All
(Cross-posted on the Inside Search Blog)

With the latest improvements to the way authorship annotations look in search and the addition of authorship to Google News, authors have been really excited about getting more visibility, and users benefit from seeing the name, photo, and way to connect with the person who created the content.

Authors have also been giving us a lot of feedback on what else they'd like to see, so today we're introducing “Author Stats” in Webmaster Tools that shows you how often your content is showing up on the Google search results page. If you associate your content with your Google Profile either via e-mail verification or a simple link, you can visit Webmaster Tools to see how many impressions and clicks your content got on the Google search results page. Check out what Matt Cutts would see for his content:

To see your information, go to google.com/webmasters and login with the same username you use for your Google+ Profile. On the left hand panel, you can see “Author Stats” under the “Labs” section. This is an experimental feature so we’re continuing to iterate and improve, but we wanted to get early feedback from you. You can e-mail us at authorship-pilot@google.com if you run into any issues or have feedback.

If you’re a content creator interested in learning more about authorship, check out our Help Center.

Posted by Javier Tordable, Software Engineer

Tips for hosting providers and webmasters

Tuesday, December 06, 2011

Webmaster level: All
Some webmasters on our forums ask about hosting-related issues affecting their sites. To help both hosting providers and webmasters recognize, diagnose, and fix such problems, we’d like to share with you some of the common problems we’ve seen and suggest how you can fix them.

Blocking of Googlebot crawling. This is a very common issue usually due to a misconfiguration in a firewall or DoS protection system and sometimes due to the content management system the site runs. Protection systems are an important part of good hosting and are often configured to block unusually high levels of server requests, sometimes automatically. Because, however, Googlebot often performs more requests than a human user, these protection systems may decide to block Googlebot and prevent it from crawling your website. To check for this kind of problem, use the Fetch as Googlebot function in Webmaster Tools, and check for other crawl errors shown in Webmaster Tools.
We offer several tools to webmasters and hosting providers who want more control over Googlebot’s crawling, and to improve crawling efficiency:
- We have detailed help about how you control Googlebot’s crawling using the robots exclusion protocol and configuring URL parameters.
- If you’re worried about rogue bots using the Googlebot user-agent, we offer a way to verify whether a crawler is actually Googlebot.
- If you would like to change how hard Googlebot crawls your site, you can verify your website in Webmaster Tools and change Googlebot’s crawl rate. Hosting providers can verify ownership of their IP addresses too.
We have more information in our crawling and indexing FAQ.
Availability issues. A related type of problem we see is websites being unavailable when Googlebot (and users) attempt to access the site. This includes DNS issues, overloaded servers leading to timeouts and refused connections, misconfigured content distribution networks (CDNs), and many other kinds of errors. When Googlebot encounters such issues, we report them in Webmaster Tools as either URL unreachable errors or crawl errors.
Invalid SSL certificates. For SSL certificates to be valid for your website, they need to match the name of the site. Common problems include expired SSL certificates and servers misconfigured such that all websites on that server use the same certificate. Most web browsers will try warn users in these situations, and Google tries to alert webmasters of this issue by sending a message via Webmaster Tools. The fix for these problems is to make sure to use SSL certificates that are valid for all your website’s domains and subdomains your users will interact with.
Wildcard DNS. Websites can be configured to respond to all subdomain requests. For example, the website at example.com can be configured to respond to requests to foo.example.com, made-up-name.example.com and all other subdomains.
In some cases this is desirable to have; for example, a user-generated content website may choose to give each account its own subdomain. However, in some cases, the webmaster may not wish to have this behavior as it may cause content to be duplicated unnecessarily across different hostnames and it may also affect Googlebot’s crawling.
To minimize problems in wildcard DNS setups, either configure your website to not use them, or configure your server to not respond successfully to non-existent hostnames, either by refusing the connection or by returning an HTTP 404 header.
Misconfigured virtual hosting. The symptom of this problem is that multiple hosts and/or domain names hosted on the same server always return the contents of only one site. To rephrase, although the server hosts multiple sites, it returns only one site regardless of what is being requested. To diagnose the issue, you need to check that the server responds correctly to the Host HTTP header.
Content duplication through hosting-specific URLs. Many hosts helpfully offer URLs for your website for testing/development purposes. For example, if you’re hosting the website http://a.com/ on the hosting provider example.com, the host may offer access to your site through a URL like http://a.example.com/ or http://example.com/~a/. Our recommendation is to have these hosting-specific URLs not publicly accessible (by password protecting them); and even if these URLs are accessible, our algorithms usually pick the URL webmasters intend. If our algorithms select the hosting-specific URLs, you can influence our algorithms to pick your preferred URLs by implementing canonicalization techniques correctly.
Soft error pages. Some hosting providers show error pages using an HTTP 200 status code (meaning “Success”) instead of an HTTP error status code. For example, a “Page not found” error page could return HTTP 200 instead of 404, making it a soft 404 page; or a “Website temporarily unavailable” message might return a 200 instead of correctly returning a 503 HTTP status code. We try hard to detect soft error pages, but when our algorithms fail to detect a web host’s soft error pages, these pages may get indexed with the error content. This may cause ranking or cross-domain URL selection issues.
It’s easy to check the status code returned: simply check the HTTP headers the server returns using any one of a number of tools, such as Fetch as Googlebot. If an error page is returning HTTP 200, change the configuration to return the correct HTTP error status code. Also, keep an eye out for soft 404 reports in Webmaster Tools, on the Crawl errors page in the Diagnostics section.
Content modification and frames. Webmasters may be surprised to see their page contents modified by hosting providers, typically by injecting scripts or images into the page. Web hosts may also serve your content by embedding it in other pages using frames or iframes. To check whether a web host is changing your content in unexpected ways, simply check the source code of the page as served by the host and compare it to the code you uploaded.
Note that some server-side code modifications may be very useful. For example, a server using Google’s mod_pagespeed Apache module or other tools may be returning your code minified for page speed optimization.
Spam and malware. We’ve seen some web hosts and bulk subdomain services become major sources of malware and spam. We try hard to be granular in our actions when protecting our users and search quality, but if we see a very large fraction of sites on a specific web host that are spammy or are distributing malware, we may be forced to take action on the web host as a whole. To help you keep on top of malware, we offer:
- Safe Browsing Alerts for Network Administrators, useful for hosting providers
- Malware notifications in Webmaster Tools for individual websites
- A Safe Browsing API for developers

We hope this list helps both hosting providers and webmasters diagnose and fix these issues. Beyond this list, also think about the qualitative aspects of hosting like quality of service and the helpfulness of support. As always, if you have questions or need more help, please ask in our Webmaster Help Forum.
Written by Pierre Far, Webmaster Trends Analyst

New markup for multilingual content

Monday, December 05, 2011

Many websites serve users from around the world. There are different approaches to serving content appropriate to your users' language and/or region. Last year, we launched support for explicit annotations for web pages rendering the same content with different language templates.
Today we're going further with our support for multilingual content with improved handling for these two scenarios:

Multiregional websites using substantially the same content. Example: English webpages for Australia, Canada and USA, differing only in price
Multiregional websites using fully translated content, or substantially different monolingual content targeting different regions. Example: a product webpage in German, English and French

Specifying language and location

We've expanded our support of the rel="alternate" hreflang link element to handle content that is translated or provided for multiple geographic regions. The hreflang attribute can specify the language, optionally the country, and URLs of equivalent content. By specifying these alternate URLs, our goal is to be able to consolidate signals for these pages, and to serve the appropriate URL to users in search. Alternative URLs can be on the same site or on another domain.

Annotating pages as substantially similar content

Optionally, for pages that have substantially the same content in the same language and are targeted at multiple countries, you may use the rel="canonical" link element to specify your preferred version. We’ll use that signal to focus on that version in search, while showing the local URLs to users where appropriate. For example, you could use this if you have the same product page in German, but want to target it separately to users searching on the Google properties for Germany, Austria, and Switzerland.
Update: to simplify implementation, we no longer recommend using rel=canonical.

Example usage

To explain how it works, let’s look at some example URLs:

http://www.example.com/ - contains the general homepage of a website, in Spanish
http://es-es.example.com/ - is the version for users in Spain, in Spanish
http://es-mx.example.com/ - is the version for users in Mexico, in Spanish
http://en.example.com/ - is the generic English language version

On all of these pages, we could use the following markup to specify language and optionally the region:


<link rel="alternate" hreflang="es" href="http://www.example.com/" />
<link rel="alternate" hreflang="es-ES" href="http://es-es.example.com/" />
<link rel="alternate" hreflang="es-MX" href="http://es-mx.example.com/" /> 
<link rel="alternate" hreflang="en" href="http://en.example.com/" />

If you specify a regional subtag, we’ll assume that you want to target that region.
Keep in mind that all of these annotations are to be used on a per-URL basis. You should take care to use the specific URL, not the homepage, for both of these link elements.

More help

As always, if you need more help correctly implementing multiregional and multilingual websites, please see our Help Center article about this topic, or ask in our Webmaster Help Forum.

Written by Christopher Semturs, Software Engineer, Search Infrastructure, Google Switzerland

Grow your audience with Google+

Monday, November 07, 2011

Webmaster Level: All

At Google, we help grow your audience by connecting you with new users. We introduced the +1 button so your site would stand out on search and your users could easily share your content on Google+. But, sometimes you want to join the conversation and post content directly to where people are sharing.

Today we’re introducing Google+ for Business, a collection of tools and products that help you grow your audience. At the core of this is Google+ Pages, your site’s identity on Google+.

Google+ Pages: Have real conversations with the right people
To get your site on Google+, you first need to create a Google+ Page. On your page, you can engage in conversations with your visitors, direct readers back to your site for the latest updates, send tailored messages to specific groups of people, and see how many +1’s you have across the web. Google+ Pages will help you build relationships with your users, encouraging them to spend more time engaging with your content.

Google+ Pages are at the heart of Google+ for Business

Hangouts

Sometimes you might want to chat with your users face-to-face. For example, if you run a food blog, you may want to invite a chef to talk about her favorite recipe, or if you manage a fashion review site, beauty specialists might want to hold how-to sessions with makeup tips. Hangouts make this easy, by letting you have high-quality video chats with nine people with a single click. You can use Hangouts to hold live forums, break news or simply get to know people better, all in real time.

Hangouts let you meet your customers, face-to-face

Circles

Circles allow you to group followers of your Page into smaller audiences. You can then share specific messages with specific groups. For example, you could create a Circle containing your most loyal readers and offer them exclusive content.
The Google+ badge: Grow your audience on Google+

To help your users find your page and start sharing, there are two buttons you can add to your site by visiting our Google+ badge configuration tool:

The Google+ icon, a small icon that directly links to your Page.

The Google+ badge, which we’re introducing in the coming days. This badge lets people add your page to their circles without leaving your site, and allows them to get updates from your site via Google+.

Extend the power of +1, stand out in Google search
You can also link your site to your Google+ page so that all your +1s -- from your Page, your website, and search results -- will get tallied together and appear as a single total. Potential visitors will be more likely to see the recommendations your site has received, whether they’re looking at a search result, your website, or your Page, meaning your +1’s will reach not only the 40 million users of Google+, but all the people who come to Google every day. You can link your site to your Page either using the Google+ badge or with a piece of code. To set this up, visit our Google+ badge configuration tool.

Bringing Google+ to the rest of Google

Our ultimate vision for Google+ is to transform the overall Google experience -- weaving identity and sharing into all of our products. Beginning today, we’re rolling out a new experimental feature to a small group of eligible publishers, Google+ Direct Connect -- an easy way for your audience to find your Google+ Page on Google search. If you’ve linked your Page to your site and you qualify, when someone searches for your website’s name with the ‘+’ sign before it Direct Connect will send them directly to your Page. For example, try searching for ‘+YouTube’ on Google. Users will also be prompted to automatically add Pages they find through Direct Connect to their circles.

Direct Connect suggestions start populating as you type on Google.com

Just the beginning
We want to help you get your site on Google+ as soon as possible, so we’re opening the field trial for Google+ Pages to everyone today. Creating a Google+ Page only takes a few minutes. To get started, you’ll need a personal Google+ profile. If you don’t have a Google account, it’s very quick and easy to join. And if you’re looking for inspiration, check out some of the sites that are already starting to set up their Pages:

To learn more about how Google+ works for your site, check out the Google+ Your Business site. We’re just getting started, and have many more features planned for the coming weeks and months. To keep up to date on the latest news and tips, add the Google+ Your Business page to your circles. If you have ideas on how we can improve Google+ for your site, we’d love to hear them.

Posted by Dennis Troper, Product Management Director, Google+ Pages

GET, POST, and safely surfacing more of the web

Tuesday, November 01, 2011

Webmaster Level: Intermediate to Advanced

As the web evolves, Google’s crawling and indexing capabilities also need to progress. We improved our indexing of Flash, built a more robust infrastructure called Caffeine, and we even started crawling forms where it makes sense. Now, especially with the growing popularity of JavaScript and, with it, AJAX, we’re finding more web pages requiring POST requests -- either for the entire content of the page or because the pages are missing information and/or look completely broken without the resources returned from POST. For Google Search this is less than ideal, because when we’re not properly discovering and indexing content, searchers may not have access to the most comprehensive and relevant results.

We generally advise to use GET for fetching resources a page needs, and this is by far our preferred method of crawling. We’ve started experiments to rewrite POST requests to GET, and while this remains a valid strategy in some cases, often the contents returned by a web server for GET vs. POST are completely different. Additionally, there are legitimate reasons to use POST (e.g., you can attach more data to a POST request than a GET). So, while GET requests remain far more common, to surface more content on the web, Googlebot may now perform POST requests when we believe it’s safe and appropriate.

We take precautions to avoid performing any task on a site that could result in executing an unintended user action. Our POSTs are primarily for crawling resources that a page requests automatically, mimicking what a typical user would see when they open the URL in their browser. This will evolve over time as we find better heuristics, but that’s our current approach.

Let’s run through a few POSTs request scenarios that demonstrate how we’re improving our crawling and indexing to evolve with the web.

Examples of Googlebot’s POST requests

Crawling a page via a POST redirect
<html>
  <body >
    <form name="foo" action="request.php" method="post">       <input type="hidden" name="bar" value="234"/>
    </form>
  </body>
</html>
Crawling a resource via a POST XMLHttpRequest
In this step-by-step example, we improve both the indexing of a page and its Instant Preview by following the automatic XMLHttpRequest generated as the page renders.
1. Google crawls the URL, yummy-sundae.html.
2. Google begins indexing yummy-sundae.html and, as a part of this process, decides to attempt to render the page to better understand its content and/or generate the Instant Preview.
3. During the render, yummy-sundae.html automatically sends an XMLHttpRequest for a resource, hot-fudge-info.html, using the POST method.
  
  <html>
    <head>
      <title>Yummy Sundae</title>
      <script src="jquery.js"></script>
    </head>
    <body>
      This page is about a yummy sundae.
      <div id="content"></div>
      <script type="text/javascript">
        $(document).ready(function() {
          $.post('hot-fudge-info.html', function(data)
            {$('#content').html(data);});
        });
      </script>
    </body>
  </html>
4. The URL requested through POST, hot-fudge-info.html, along with its data payload, is added to Googlebot’s crawl queue.
5. Googlebot performs a POST request to crawl hot-fudge-info.html.
6. Google now has an accurate representation of yummy-sundae.html for Instant Previews. In certain cases, we may also incorporate the contents of hot-fudge-info.html into yummy-sundae.html.
7. Google completes the indexing of yummy-sundae.html.
8. User searches for [hot fudge sundae].
9. Google’s algorithms can now better determine how yummy-sundae.html is relevant for this query, and we can properly display a snapshot of the page for Instant Previews.

Improving your site’s crawlability and indexability

General advice for creating crawlable sites is found in our Help Center. For webmasters who want to help Google crawl and index their content and/or generate the Instant Preview, here are a few simple reminders:

Prefer GET for fetching resources, unless there’s a specific reason to use POST.
Verify that we're allowed to crawl the resources needed to render your page. In the example above, if hot-fudge-info.html is disallowed by robots.txt, Googlebot won't fetch it. More subtly, if the JavaScript code that issues the XMLHttpRequest is located in an external .js file disallowed by robots.txt, we won't see the connection between yummy-sundae.html and hot-fudge-info.html, so even if the latter is not disallowed itself, that may not help us much. We've seen even more complicated chains of dependencies in the wild. To help Google better understand your site it's almost always better to allow Googlebot to crawl all resources.

You can test whether resources are blocked through Webmaster Tools “Labs -> Instant Previews.”
Make sure to return the same content to Googlebot as is returned to users’ web browsers. Cloaking (sending different content to Googlebot than to users) is a violation of our Webmaster Guidelines because, among other things, it may cause us to provide a searcher with an irrelevant result -- the content the user views in their browser may be a complete mismatch from what we crawled and indexed. We’ve seen numerous POST-request examples where a webmaster non-maliciously cloaked (which is still a violation), and their cloaking -- on even the smallest of changes -- then caused JavaScript errors that prevented accurate indexing and completely defeated their reason for cloaking in the first place. Summarizing, if you want your site to be search-friendly, cloaking is an all-around sticky situation that’s best to avoid.

To verify that you're not accidentally cloaking, you can use Instant Previews within Webmaster Tools, or try setting the User-Agent string in your browser to something like:

Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)

Your site shouldn't look any different after such a change. If you see a blank page, a JavaScript error, or if parts of the page are missing or different, that means that something's wrong.
Remember to include important content (i.e., the content you’d like indexed) as text, visible directly on the page and without requiring user-action to display. Most search engines are text-based and generally work best with text-based content. We’re always improving our ability to crawl and index content published in a variety of ways, but it remains a good practice to use text for important information.

Controlling your content

If you’d like to prevent content from being crawled or indexed for Google Web Search, traditional robots.txt directives remain the best method. To prevent the Instant Preview for your page(s), please see our Instant Previews FAQ which describes the “Google Web Preview” User-Agent and the nosnippet meta tag.

Moving forward

We’ll continue striving to increase the comprehensiveness of our index so searchers can find more relevant information. And we expect our crawling and indexing capability to improve and evolve over time, just like the web itself. Please let us know if you have questions or concerns.

Written by Pawel Aleksander Fedorynski, Software Engineer, Indexing Team, and Maile Ohye, Developer Programs Tech Lead

Webmaster Central Blog