[go: nahoru, domu]

Note from the editors: After previously looking into various ways to handle internationalization for Google’s web-search, here’s a post from Google Web Studio team members with tips for web developers.

Many websites exist in more than one language, and more and more websites are made available for more than one language. Yet, building a website for more than one language doesn’t simply mean translation, or localization (L10N), and that’s it. It requires a few more things, all of which are related to internationalization (I18N). In this post we share a few tips for international websites.

1. Make pages I18N-ready in the markup, not the style sheets

Language and directionality are inherent to the contents of the document. If possible you should hence always use markup, not style sheets, for internationalization purposes. Use @lang and @dir, at least on the html element:

<html lang="ar" dir="rtl">

Avoid coming up with your own solutions like special classes or IDs.

As for I18N in style sheets, you can’t always rely on CSS: The CSS spec defines that conforming user agents may ignore properties like direction or unicode-bidi. (For XML, the situation changes again. XML doesn’t offer special internationalization markup, so here it’s advisable to use CSS.)

2. Use one style sheet for all locales

Instead of creating separate style sheets for LTR and RTL directionality, or even each language, bundle everything in one style sheet. That makes your internationalization rules much easier to understand and maintain.

So instead of embedding an alternative style sheet like

<link href="default.rtl.css" rel="stylesheet">

just use your existing

<link href="default.css" rel="stylesheet">

When taking this approach you’ll need to complement existing CSS rules by their international counterparts:

3. Use the [dir='rtl'] attribute selector

Since we recommend to stick with the style sheet you have (tip #2), you need a different way of selecting elements you need to style differently for the other directionality. As RTL contents require specific markup (tip #1), this should be easy: For most modern browsers, we can simply use [dir='rtl'].

Here’s an example:

aside {
 float: right;
 margin: 0 0 1em 1em;
}

[dir='rtl'] aside {
 float: left;
 margin: 0 1em 1em 0; 
}

4. Use the :lang() pseudo class

To target documents of a particular language, use the :lang() pseudo class. (Note that we’re talking documents here, not text snippets, as targeting snippets of a particular language makes things a little more complex.)

For example, if you discover that bold formatting doesn’t work very well for Chinese documents (which indeed it does not), use the following:

:lang(zh) strong,
:lang(zh) b {
 font-weight: normal;
 color: #900;
}

5. Mirror left- and right-related values

When working with both LTR and RTL contents it’s important to mirror all the values that change directionality. Among the properties to watch out for is everything related to borders, margins, and paddings, but also position-related properties, float, or text-align.

For example, what’s text-align: left in LTR needs to be text-align: right in RTL.

There are tools to make it easy to “flip” directionality. One of them is CSSJanus, though it has been written for the “separate style sheet” realm, not the “same style sheet” one.

6. Keep an eye on the details

Watch out for the following items:
  • Images designed for left or right, like arrows or backgrounds, light sources in box-shadow and text-shadow values, and JavaScript positioning and animations: These may require being swapped and accommodated for in the opposite directionality.
  • Font sizes and fonts, especially for non-Latin alphabets: Depending on the script and font, the default font size may be too small. Consider tweaking the size and, if necessary, the font.
  • CSS specificity: When using the [dir='rtl'] (or [dir='ltr']) hook (tip #2), you’re using a selector of higher specificity. This can lead to issues. Just have an eye out, and adjust accordingly.

If you have any questions or feedback, check the Internationalization Webmaster Help Forum, or leave your comments here.

Webmaster level: All

If Google understands your website’s content in a structured way, we can present that content more accurately and more attractively to Google users. For example, our algorithms can enhance your search results with “rich snippets” when we understand that your page is a structured product listing, event, recipe, review, or similar. We can also feature your data in Knowledge Graph panels or in Google Now cards, helping to spread the word about your content.

Today we’re excited to announce two features that make it simpler than ever before to participate in structured data features. The first is an expansion of Data Highlighter to seven new types of structured data. The second is a brand new tool, the Structured Data Markup Helper.

Support for Products, Businesses, Reviews and more in Data Highlighter

Data Highlighter launched in December 2012 as a point-and-click tool for teaching Google the pattern of structured data about events on your website — without even having to edit your site’s HTML. Now, you can also use Data Highlighter to teach us about many other kinds of structured data on your site: products, local businesses, articles, software applications, movies, restaurants, and TV episodes. Update: You can see the full list of schemas supported in Data Highlighter here.

To get started, visit Webmaster Tools, select your site, click the "Optimization" link in the left sidebar, and click "Data Highlighter". You’ll be prompted to enter the URL of a typically structured page on your site (for example, a product or event’s detail page) and “tag” its key fields with your mouse.

Google Structured Data Highlighter

The tagging process takes about 5 minutes for a single page, or about 15 minutes for a pattern of consistently formatted pages. At the end of the process, you’ll have the chance to verify Google’s understanding of your structured data and, if it’s correct, “publish” it to Google. Then, as your site is recrawled over time, your site will become eligible for enhanced displays of information like prices, reviews, and ratings right in the Google search results.

New Structured Data Markup Helper tool

While Data Highlighter is a great way to quickly teach Google about your site’s structured data without having to edit your HTML, it’s ultimately preferable to embed structured data markup directly into your web pages, so your structured content is available to everyone. To assist web authors with that task, we’re happy to announce a new tool: the Structured Data Markup Helper.

Like in Data Highlighter, you start by submitting a web page (URL or HTML source) and using your mouse to “tag” the key properties of the relevant data type. When you’re done, the Structured Data Markup Helper generates sample HTML code with microdata markup included. This code can be downloaded and used as a guide as you implement structured data on your website.

Structured Data Markup Helper

The Structured Data Markup Helper supports a subset of data types, including all the types supported by Data Highlighter as well as several types used for embedding structured data in Gmail. Consult schema.org for complete schema documentation.

We hope these two tools make it easier for all websites to participate in Google’s growing suite of structured data features! As always, please post in our forums if you have any questions or feedback.

Webmaster level: all

Today, we’re launching support for the schema.org markup for organization logos, a way to connect your site with an iconic image. We want you to be able to specify which image we use as your logo in Google search results.

Using schema.org Organization markup, you can indicate to our algorithms the location of your preferred logo. For example, a business whose homepage is www.example.com can add the following markup using visible on-page elements on their homepage:

<div itemscope itemtype="http://schema.org/Organization">
  <a itemprop="url" href="http://www.example.com/">Home</a>
  <img itemprop="logo" src="http://www.example.com/logo.png" />
</div>

Update 21 October 2014: You can also use any other supported syntax such as this JSON-LD code:

<script type="application/ld+json">
      {
      "@context": "http://schema.org/",
      "@type": "Organization",
      "url": "http://www.example.com/",
      "logo": "http://www.example.com/logo.png"
      }
    </script>
This example indicates to Google that this image is designated as the organization’s logo image for the homepage also included in the markup, and, where possible, may be used in Google search results. Markup like this is a strong signal to our algorithms to show this image in preference over others, for example when we show Knowledge Graph on the right hand side based on users’ queries.

As always, please ask us in the Webmaster Help Forum if you have any questions.

Webmaster Level: All

The homepages of multinational and multilingual websites are sometimes configured to point visitors to localized pages, either via redirects or by changing the content to reflect the user’s language. Today we’ll introduce a new rel-alternate-hreflang annotation that the webmaster can use to specify such homepages that is supported by both Google and Yandex.

To see this in action, let’s look at an example. The website example.com has content that targets users around the world as follows:

Map of the world illustrating which hreflang code to use for which locale

In this case, the webmaster can annotate this cluster of pages using rel-alternate-hreflang using Sitemaps or using HTML link tags like this:


<link rel="alternate" href="http://example.com/en-gb" hreflang="en-gb" />
<link rel="alternate" href="http://example.com/en-us" hreflang="en-us" />
<link rel="alternate" href="http://example.com/en-au" hreflang="en-au" />
<link rel="alternate" href="http://example.com/" hreflang="x-default" />

The new x-default hreflang attribute value signals to our algorithms that this page doesn’t target any specific language or locale and is the default page when no other page is better suited. For example, it would be the page our algorithms try to show French-speaking searchers worldwide or English-speaking searchers on google.ca.

The same annotation applies for homepages that dynamically alter their contents based on a user’s perceived geolocation or the Accept-Language headers. The x-default hreflang value signals to our algorithms that such a page doesn’t target a specific language or locale.

As always, if you have any questions or feedback, please tell us in the Internationalization Webmaster Help Forum.

Webmaster Level: Intermediate to Advanced

Including a rel=canonical link in your webpage is a strong hint to search engines your about preferred version to index among duplicate pages on the web. It’s supported by several search engines, including Yahoo!, Bing, and Google. The rel=canonical link consolidates indexing properties from the duplicates, like their inbound links, as well as specifies which URL you’d like displayed in search results. However, rel=canonical can be a bit tricky because it’s not very obvious when there’s a misconfiguration.


While the webmaster sees the “red velvet” page on the left in their browser, search engines notice on the webmaster’s unintended “blue velvet” rel=canonical on the right.

We recommend the following best practices for using rel=canonical:
  • A large portion of the duplicate page’s content should be present on the canonical version.
  • One test is to imagine you don’t understand the language of the content—if you placed the duplicate side-by-side with the canonical, does a very large percentage of the words of the duplicate page appear on the canonical page? If you need to speak the language to understand that the pages are similar; for example, if they’re only topically similar but not extremely close in exact words, the canonical designation might be disregarded by search engines.
  • Double-check that your rel=canonical target exists (it’s not an error or “soft 404”)
  • Verify the rel=canonical target doesn’t contain a noindex robots meta tag
  • Make sure you’d prefer the rel=canonical URL to be displayed in search results (rather than the duplicate URL)
  • Include the rel=canonical link in either the <head> of the page or the HTTP header
  • Specify no more than one rel=canonical for a page. When more than one is specified, all rel=canonicals will be ignored.
Mistake 1: rel=canonical to the first page of a paginated series

Imagine that you have an article that spans several pages:
  • example.com/article?story=cupcake-news&page=1
  • example.com/article?story=cupcake-news&page=2
  • and so on
Specifying a rel=canonical from page 2 (or any later page) to page 1 is not correct use of rel=canonical, as these are not duplicate pages. Using rel=canonical in this instance would result in the content on pages 2 and beyond not being indexed at all.


Good content (e.g., “cookies are superior nutrition” and “to vegetables”) is lost when specifying rel=canonical from component pages to the first page of a series.



rel=canonical from component pages to the view-all page


If rel=canonical to a view-all page isn’t designated, paginated content can use rel=”prev” and rel=”next” markup.

Mistake 2: Absolute URLs mistakenly written as relative URLs


The <link> tag, like many HTML tags, accepts both relative and absolute URLs. Relative URLs include a path “relative” to the current page. For example, “images/cupcake.png” means “from the current directory go to the “images” subdirectory, then to cupcake.png.” Absolute URLs specify the full path—including the scheme like http://.

Specifying <link rel=canonical href=“example.com/cupcake.html” /> (a relative URL since there’s no “http://”) implies that the desired canonical URL is http://example.com/example.com/cupcake.html even though that is almost certainly not what was intended. In these cases, our algorithms may ignore the specified rel=canonical. Ultimately this means that whatever you had hoped to accomplish with this rel=canonical will not come to fruition.

Mistake 3: Unintended or multiple declarations of rel=canonical

Occasionally, we see rel=canonical designations that we believe are unintentional. In very rare circumstances we see simple typos, but more commonly a busy webmaster copies a page template without thinking to change the target of the rel=canonical. Now the site owner’s pages specify a rel=canonical to the template author’s site.


If you use a template, check that you didn’t also copy the rel=canonical specification.

Another issue is when pages include multiple rel=canonical links to different URLs. This happens frequently in conjunction with SEO plugins that often insert a default rel=canonical link, possibly unbeknownst to the webmaster who installed the plugin. In cases of multiple declarations of rel=canonical, Google will likely ignore all the rel=canonical hints. Any benefit that a legitimate rel=canonical might have offered will be lost.

In both these types of cases, double-checking the page’s source code will help correct the issue. Be sure to check the entire <head> section as the rel=canonical links may be spread apart.


Check the behavior of plugins by looking at the page’s source code.

Mistake 4: Category or landing page specifies rel=canonical to a featured article

Let’s say you run a site about desserts. Your dessert site has useful category pages like “pastry” and “gelato.” Each day the category pages feature a unique article. For instance, your pastry landing page might feature “red velvet cupcakes.” Because the “pastry” category page has nearly all the same content as the “red velvet cupcake” page, you add a rel=canonical from the category page to the featured individual article.

If we were to accept this rel=canonical, then your pastry category page would not appear in search results. That’s because the rel=canonical signals that you would prefer search engines display the canonical URL in place of the duplicate. However, if you want users to be able to find both the category page and featured article, it’s best to only have a self-referential rel=canonical on the category page, or none at all.


Remember that the canonical designation also implies the preferred display URL. Avoid adding a rel=canonical from a category or landing page to a featured article.

Mistake 5: rel=canonical in the <body>

The rel=canonical link tag should only appear in the <head> of an HTML document. Additionally, to avoid HTML parsing issues, it’s good to include the rel=canonical as early as possible in the <head>. When we encounter a rel=canonical designation in the <body>, it’s disregarded.

This is an easy mistake to correct. Simply double-check that your rel=canonical links are always in the <head> of your page, and as early as possible if you can.


rel=canonical designations in the <head> are processed, not the <body>.

Conclusion

To create valuable rel=canonical designations:
  • Verify that most of the main text content of a duplicate page also appears in the canonical page.
  • Check that rel=canonical is only specified once (if at all) and in the <head> of the page.
  • Check that rel=canonical points to an existent URL with good content (i.e., not a 404, or worse, a soft 404).
  • Avoid specifying rel=canonical from landing or category pages to featured articles as that will make the featured article the preferred URL in search results.
And, as always, please ask any questions in our Webmaster Help forum.

Webmaster level: All

Since we launched the Webmaster Academy in English back in May 2012, its educational content has been viewed well over 1 million times.

The Webmaster Academy was built to guide webmasters in creating great sites that perform well in Google search results. It is an ideal guide for beginner webmasters but also a recommended read for experienced users who wish to learn more about advanced topics.

To support webmasters across the globe, we’re happy to announce that we’re launching the Webmaster Academy in 20 languages. So whether you speak Japanese or Italian, we hope we can help you to make even better websites! You can easily access it through Webmaster Central.

We’d love to read your comments here and invite you to join the discussion in the help forums.


Webmasters have several ways to keep their sites' content out of Google's search results. Today, as promised, we're providing a way for websites to opt out of having their content that Google has crawled appear on Google Shopping, Advisor, Flights, Hotels, and Google+ Local search.

Webmasters can now choose this option through our Webmaster Tools, and crawled content currently being displayed on Shopping, Advisor, Flights, Hotels, or Google+ Local search pages will be removed within 30 days.