[go: nahoru, domu]

Inside Google Sitemaps: December 2005

Your source for product news and developments

www vs non-www versions of a site


Two URLs to a site -- one that is prefaced with www and one that is not (for instance, http://www.example.com/ and http://example.com/) -- often point to the same location on a server. But depending on the server configuration, they may point to different locations, so search engines can't assume they are the same.

This post provides tips for viewing stats if the www and non-www version of your site point to the same location.

If you have added your site without prefacing the domain with www (for instance, http://example.com/), and the www version of this domain points to the same location, try adding the www version of the domain (for instance, http://www.example.com/) to your account. You may see a wider variety of stats for the www version of the domain.

You can add a site by:
  1. Clicking the Add tab.
  2. Scrolling to the "If you don't have a Sitemap" area and entering the site URL.
  3. Clicking Add Site.
Similarly, if you have added the www version of the domain and the non-www version points to the same location, you can add the non- www version to see stats for that version.

If the verification file still exists in the root of your site and both versions of the domain point to the same location, you can verify the second version simply by accessing the Verify tab and clicking the Verify button.

Note that having both versions of the site's URL listed in your account won't affect the indexing of your site as long as you have submitted a Sitemap for only one version - the version you want to be indexed. Don't submit a Sitemap for both versions if the location and content are the same.

If both domains point to the same location and you have pages indexed under both versions, see our Google Help Center for more information on consolidating the listings under one domain.

We hear requests for help with this often, so we'll be looking at ways to improve this issue in the coming months.

Permalink

Verifying a site located in a subdirectory


If your site is located in a subdirectory, rather than at the root level of a domain (for instance, at http://www.example.com/site/), you are given two choices when you verify. You can either verify at the subdirectory level or at the root level.


If you verify at the root level (http://www.example.com/), we can show you a greater variety of statistics.

If you are unable to upload a file at the root level, you can still view error information for the subdirectory at which you've verified, as well as information about your Sitemaps. Verification does not affect your Sitemap submission or the crawling or indexing of your site.

If you verify at the subdirectory level, some statistics are not available. Instead, some stats pages provide a link for verifying at the root level.



If you have root-level access, simply click the verify link, place the requested verification file in the root directory and then click Verify.

Some questions you may have about verifying at the subdirectory level:

I verified successfully at http://www.example.com/site/. But when I access my stats, I see a message asking me to verify again. Why is it asking me this since I already verified?
That message is providing a link so you can verify at the root level. Rest assured that your site is still verified at the subdirectory level and you can access all information that we have available at this time for subdirectories (index stats, Sitemap details, and site errors).

I verified successfully at http://www.example.com/site/. But I can still see the Verify tab. When I access that tab and click "Verify", I get a message that Google couldn't find my verification file and that my site isn't verified. The original verification file is still present at http://www.example.com/site/, so why am I getting this message?
If you are successfully verified at the subdirectory level, the Verify tab will still appear so that you can later go back and verify at the root level. When you are already verified at the subdirectory level, we look for the verification file in the root of the domain when you try to verify again. You'll see an error message if we don't find it there.

I can't verify at the root. Will this affect my listings in Google?
No. Verification at the root lets you see a greater variety of statistics for your site. It does not affect how we use your Sitemap, how we crawl your site, your PageRank, or any other factor.

A previous version let me verify at the subdirectory level. Why did you change it in this version?
You can still verify at the subdirectory level as you could before and see everything you could before (everything listed under the Sitemaps tab and Errors tab). We've added the option of verifying at the root, which lets you see root-level stats.

Permalink

More query stats; verification enhancements


We've made a few improvements that we wanted to let you know about.

Expanded query stats
We've expanded the number of query stats that we show you (both top search queries and top search query clicks). Note that the exact number of stats you see depends on how often your site comes up in search results. A large site that has been around for a while will generally have more stats than a new, smaller site.

Verification enhancements
We've also been working on the verification process. Last month, we posted to the Group that we were looking for your input on how to improve this process. Yesterday, we told you that we added support for lowercase verification filenames. Now, we've added another improvement. Some of you have received one of the following messages when you try to verify:

Our system has experienced a temporary problem.

Or,

The system is currently busy. Please try again in a few minutes.

In the past, you've had to manually try again later. Now, instead of seeing one of those messages, you'll see this message:

The system is currently busy. We will process this verification as soon as possible. Please check back later for an updated status.

We'll add your verification request to a queue and try it for you later, so that you don't have to keep trying manually.

If you receive another error that might be due to a temporary issue (for instance, if we can't reach your server due to a DNS timeout), we'll show you the error, but we'll also add your verification request to the queue.
  • While your request is in the queue, you'll see "PENDING" as the status in the Verify tab.
  • Once we've processed the request successfully, we will display your site as verified and you can access your stats.
  • If we aren't able to successfully process the request, you'll see "NOT VERIFIED" as the status in the Verify tab. If you see this, check that your verification file is in the correct location, is named exactly as we ask, that your robots.txt file doesn't block access, and that your server is up and responding to requests. Then, try your verification request again.

Permalink

Lowercase verification filenames


We've been looking into ways to enhance our verification process and noticed that some members of our Group have been unable to upload case-sensitive filenames to their webservers. To help them, we now check for a lowercase verification filename if we don't find the case-sensitive one. So now you can upload a verification file with the lowercase filename. (If you have already verified, you don't need to do anything new.)

Thanks to our Groups members for their input -- and keep that feedback coming!

Permalink

New version of Sitemap Generator


We recently uploaded a new version (v1.4) of the the Sitemap Generator tool.

This version has the same features as the last one, but fixes a subtle bug in writing GZip compressed Sitemap files. The old version stored more path information than it needed to when it created GZip files, and this was a point of concern for some webmasters.

The bug was found, and the bugfix suggested, by members of the Sitemaps community. Thanks for bringing it to our attention.

Permalink

Trouble with verification


Some of you have had one of the following responses when trying to verify:

"Our system has experienced a temporary problem."

or,

" The system is currently busy. Please try again in a few minutes."

However, when you do try again later, you continue to get one of these messages. This is a known issue and we are working to resolve it as quickly as possible. Thanks for your patience.

Permalink

If you don't see the full range of stats


The stats we show you are all about what Google knows about your site. If some stats aren't available, it's probably because we don't know a whole lot about your site yet. As we learn more about it (from your Sitemap and our crawling mechanisms), we'll have more information to show you.

If your site is new to our index, it doesn't yet have a lot of pages indexed, or if not many searches have brought up results to your site, we may not have a lot of stats available yet. You can easily get an idea of your indexed pages by accessing the Index stats page and clicking the link in the Indexed pages in your site row of the table. Over time, our knowledge about your site will increase and we can share that with you.

Permalink

Third-party programs


We've just updated the list of third-party programs that you can use to create Sitemaps. You can check out the current list on code.google.com. That page also lists the Sitemaps Google Group for each language we support.

If you've written a tool that supports Google Sitemaps, let us know at code+sitemaps@google.com.

The Sitemaps team is thankful for our great user community and appreciates all the work that's gone into these tools.

Permalink

Sitemaps in Japanese


Last week, we added support for Japanese. As with the other languages we support, if you already use Google in Japanese, you should see the user interface and documentation in Japanese automatically. Otherwise, you can click the Preferences link from the Google home page and choose Japanese from the interface list.

We have also added a Japanese Sitemaps Google Group.

Permalink

Site Verification


This morning we learned of an issue with the Google Sitemaps tool that may have temporarily enabled users to view statistics about sites they do not own. We acted quickly and fixed the issue. To ensure the security of all sites using the Google Sitemaps tool, we will re-verify all sites added in the last 48 hours.

When we first started showing statistics a couple of months ago, we put a system in place to prevent anyone other than site owners from seeing stats for a site. We ask each site owner to place a unique file on the site and then we check to see if that file exists. When we do that check, we first make sure that the server isn't misconfigured to return a valid page when a request is made for a page that doesn't exist. We only verify sites that are configured correctly. You can read more about that process in our documentation.

Unfortunately, with our latest release, a bug prevented this process from working correctly. We fixed this as soon as we found out about the problem. We take your privacy very seriously and are currently investigating other approaches to further enhance security.

Permalink

More stats!


We've just launched some new features. The biggest change for those of you already using Sitemaps: if you've verified your site, you'll see substantially more stats and error details.

The biggest change for new users: you can now add a site to your account even before create a Sitemap for it. Once you've added and verified your site, you can see all these new stats and errors.

New stats: how the Googlebot views your site
The new stats we're showing you are all about letting you know how we see your site.

With query stats, we show you the top Google search queries that return pages to your site as well as the top queries that caused users to click on your site in the search results.



With crawl stats, you can see how we view crawled pages. You can see a distribution of the pages successfully crawled and the pages with errors as well as a distribution of PageRank for the pages in your site.



Page analysis shows you what we detect about the content and encoding of your pages.

Index stats provide an easy way for you to use our advanced search operators to return results about how we see the indexed pages of your site.

Mobile stats
You can now verify your mobile sites and see stats for them.

More detailed errors
Now you’ll have more details about problems we had crawling your site. We report on 40 different types of errors in 5 categories.



Adding a site
Even before you have a Sitemap for your site, you can still take advantage of all the statistics and error information we have available. Simply create a Sitemaps account and add a site to it. Once you verify site ownership, we provide the full range of statistics and error details. Of course, we encourage you to add a Sitemap so we can learn more about your site.

Permalink

Changing domains


From our Google Group:

I've moved my site to a new domain. Can I submit a Sitemap to tell you to index the new site rather than the old site?

Submitting a Sitemap for the new site is a great first step, because that helps us learn about the new pages right away. Make sure you place the new Sitemap in the root directory of the new site as the Sitemap must be located on the same domain as the site URLs contained in it.

Another important thing to do is redirect visitors from the old site to the new one. Put a 301 (permanent) redirect on every page of the old site to point to the corresponding page on the new site.

You can find out more about 301 HTTP redirects from RFC-2616 and you can learn more about how to make the site move a smooth one from our Google Help Center.

Permalink

URLs with HTTP errors


What can you do about URLs that we tried to crawl but couldn't because we received a 404 (not found) error? (You can see these once you've verified your site by clicking the stats link beside the Sitemap name on the My Sitemaps page.)

You don't have to do anything about them. We'll continue to crawl and index your site and will simply skip pages that return a 404. But here are some things you can do.

If we found the URLs from your Sitemap, the fix is simple. Just modify your Sitemap to list the correct pages and resubmit it.

If we found the URLs by following links, the fix isn't quite as easy. In fact, in some cases, there may be no fix. A webmaster may have liked your site and tried to link to it, but mistyped the URL. You can look for sites that link to your pages and ask webmasters to fix any broken links, but if that sounds like a lot of work, you can instead just focus on your own site.

Check the links in your site
You may not be able to control inbound links from other sites, but you can control internal links. Make sure that none of these broken links are coming from your site. You can generally check your webserver logs to see what visitors clicked on in your site that returned 404 errors.

If the links are outdated
It could be that a link points to a non-existent page because that page used to exist, but no longer does. In that case, you can:
  • Make sure that your site doesn't link to any outdated pages
  • Check to see if any of these outdated pages are in the Google index
If the page used to exist, but no longer does, it might still be listed in the Google index. (You can check this by doing a Google search for the URL.) If it is, you can either wait for subsequent crawls (if we continue to get 404 errors when we try to crawl it, it will eventually be removed), or you can request its removal using our automatic URL removal system.

In order to use this system, the outdated page must return a 404 (and if the URL is showing up on your Sitemaps Stats page, it already does). Log in and then choose the Remove an outdated link option. Type in the URL,choose anything associated with this URL and click Remove outdated link. The link will show up in a status area as pending. The page should be removed from the index within 3-5 days and the status will be updated.

Permalink

What to do when your Sitemap status is "Denied URLs"


If your Sitemap status is "Denied URLs" and the error listed is "URL not under Sitemap path", here are some things to check.

Make sure the URL root matches
If you submit your Sitemap using the path http://example.com/sitemap.xml, then the URLs in your Sitemap should begin with example.com. Any URLs that begin with www.example.com aren't considered to be under the Sitemap path. Along those lines, if you Submit your Sitemap using the path http://www.example.com/sitemap.xml, the URLs in that Sitemap should begin with www.example.com.

To fix this problem, you can either edit the URLs listed in your Sitemap file to match the submitted path, or you can delete the Sitemap and then submit it again using the path that matches the URLs listed in it.

Make sure the Sitemap is at the highest-level directory
If you submit your Sitemap using the path http://www.example.com/sample_folder/sitemap.xml, then all URLs in that Sitemap must begin with that path. This means that http://www.example.com/sample.html wouldn't be considered a valid URL in the Sitemap.

If all possible, place your Sitemap at the root location of your site to avoid these types of problems. If you can't place the Sitemap at the root, then list only URLs from the Sitemap location and lower.

See the Sitemaps documentation for more details.

Permalink

When your site changes


Here's another question we've gotten. If you have a question about Sitemaps, let us know by posting in the Sitemaps Google Group.

I've submitted a Sitemap for my site. What should I do when my site changes?

If you add pages to your site, you can let us know in several different ways:
  • Resubmit your Sitemap: You can add the URLs to your existing Sitemap and then resubmit it either by using your Google Account or by sending us a ping (HTTP request).

  • Create a new Sitemap: You can create a new Sitemap with only the new URLs and submit that separately using your Google Account.

  • Use a Sitemap index file: You can create a new Sitemap with only the new URLs and then list both the original Sitemap and the new Sitemap in a Sitemap index file. Then, you can simply submit the Sitemap index file. If you often add pages to your site, you can periodically create a new Sitemap with new URLs, add the the new Sitemap to the Sitemap index file, and resubmit the Sitemap index file.
Note that if you ping us to let us know about changes to your Sitemap, make sure that you submit that Sitemap using your Google Account at least once. The Last Submitted date shown in your Google Sitemaps Account lists the date you last submitted the Sitemap manually, but if you receive a status OK (200) from the ping, we received it successfully.

If you've deleted pages from your site, you can delete those pages from your Sitemap and then resubmit it (either manually through your Google Account or by pinging us).

Permalink

Including site pages in a Sitemap


From time to time, we use this blog to answer some common questions. Here's one:

Do I have to include every URL from my site in my Sitemap? If I don't include some of them, will they be excluded from the Google index?

Your Sitemap provides us with an additional way to learn about your site. We still use all of our other methods, such as following links from your site's HTML sitemap and from pages that link to you. We discover URLs that you don't include in your Sitemap through these regular crawling processes --it just may take us longer, and we won't have any extra information that you can provide in a Sitemap (such as priority, last modification date, and change frequency).

We won't exclude URLs that you don't list in your Sitemap from the Google index.

Permalink

Searching what Google knows about your site


So, you've submitted your Sitemap. How can you tell what Google knows about your site?

You can use Google’s advanced search features to get a list of:
  • search results from your site
  • pages that link to your site
  • pages that refer to your site’s URL
You can also see information that we have about your site.

You can do many of these advanced searches (amazingly enough) by clicking the Advanced Search link on www.google.com. You can also use our advanced search operators in your query. If you are using operators, remember that there should not be a space between the operator and the URL. Note that we use brackets to indicate the words in the search box. The query itself should not include the brackets.

Results from your site (site: operator)
To find pages from your site, use the site: operator. For instance, [site:www.google.com]. You can also type the URL in the Domain field of the Advanced Search page.


You can also use this feature to search through any site. Simply enter the search query followed by the site: operator and the site you want to search through. For instance, to search for admission information on the Stanford University web site, type [admission site:www.stanford.edu]. And, you can use this feature to search through sites from specific top-level domains. For instance, to search for information about Zürich on Swiss sites in the .ch domain, type [Zürich site:.ch].

Pages that link to your site (link: operator)
To find pages that link to your site, use the link: operator. For instance, [link:www.google.com]. You can also type the URL in the Links field of the Advanced Search page.



Pages that refer to your site’s URL (allinurl: operator)
To find pages that include your site’s URL, use the allinurl: operator. For instance, [allinurl:www.google.com].

Information Google has about your site (info: operator)
To see information that Google knows about your site, use the info: operator. For instance, [info:www.google.com]. That query results in the following:


Some questions you may have about these results...

I submitted my Sitemap but I don't see all the pages listed. When will they be indexed?
We can't guarantee if or when we'll index pages we receive from Sitemaps submissions. We use Sitemaps as another view into your site to augment our regular crawling methods. Also, we don't want the Googlebot to overwhelm your bandwidth, so we may not crawl it all at one time.

Some of my results are labeled "Supplemental". What does that mean?
That means that pages are part of our auxiliary index. You can read more about that in our webmaster guidelines.

Permalink

Verifying your site


Verification
Once you verify your site, we show you additional statistics. We require verification to make sure that we only show these stats to site owners.

When you verify, we ask you upload a unique text file to a particular directory on your webserver. Periodically, we check to see if this file still exists. We do this to make sure you still own the site. If we can’t find this file when we recheck, we ask you to verify again.

Questions you might have about verification:

Once I’ve verified my site, can I delete the verification text file?
You can delete it once you’ve verified, but when we do our periodic check for it, we’ll ask you to upload it again.

Someone who used to have write-access to my site no longer does. How can I make sure this person can no longer see the stats for my site?
Simply delete the verification file. When we do our next periodic check, if that person tries to see stats for the site, we’ll ask for the file again. Since that person no longer has write access to the site, we won’t find the file and won’t show the additional statistics any longer.

But I still want to see stats for the site. How can I do that if my site is no longer verified?
We ask for a different verification file for each Google Account. You can log in with your Google Account and still see statistics.

The person who uploaded the Sitemaps for my site no longer has write access. I don’t have access to that person’s Google Account. How can I see information about my site?
Simply log in with your own Google Account and submit your Sitemaps using that account. We’ll ask you to upload a verification file that is unique to your Google Account so that you can see additional stats for your site.

Permalink

All new!


We’ve added some new features to Google Sitemaps.

Date stamps for statistics
For information we provide once you’ve verified your site, we now let you know when we tried to crawl the URLs we tell you about.

Enhanced support for special characters in URLs
Note that the Sitemap URL must be encoded for readability by the webserver on which it is located. In addition, it can contain only ASCII characters. It can't contain upper ASCII characters, certain control codes, or special characters such as * and {}. If your Sitemap URL contains out-of-range characters, escape them when you submit the URL. Otherwise, you'll receive an error when you try to submit it. You can find more information on escaping out-of-range characters by doing a Google search for [html escape codes]. All URLs must follow the RFC-3986 standard for URIs and the RFC-3987 standard for IRIs.

Documentation updates
We’ve updated the documentation for these new features, as well as added information about the latest version of the Sitemap Generator script and about OAI-PMH submissions (both of which we talked about in earlier blog posts). We’ve also provided some information about errors you might come across when you submit a Sitemap. All we’ve made these updates in every language for which we provide documentation.

Resolved issues
And we’ve resolved two issues with this release that you brought to our attention in the Google Group.
  • Some of you experienced trouble when clicking on the verify link on the My Sitemaps page. Instead of being asked to verify, the My Sitemaps page simply reloaded. We've fixed this.
  • Some of you had trouble adding Sitemaps from Internet Explorer. This happened when you pressed Enter rather than clicked the Submit URL link. You saw a message that said your Sitemap was submitted, but the Sitemap wasn’t added to your My Sitemaps page. We've fixed this too.

If either happen to you, or if you experience any other trouble, please let us know by posting in the Google Group.

Several of these features were a direct result of your feedback. Once again, we appreciate your input during our beta period.

Permalink

We show you more


Just about a month ago, Google Sitemaps added new statistics about problems Google encountered crawling your pages. This stats page showed you up to three URLs we had trouble accessing for each type of error. You asked for more. So we’re giving you more.

Now, once you verify your site, we’ll show you up to 10 URLs we’ve had trouble accessing for each type of error, for a maximum total of 60 URLs.

Keep posting your suggestions to our Google Group and we'll keep listening. Thanks for your participation during our beta period.

Permalink

How is a Google Sitemap different from an HTML sitemap?


A Google Sitemap is an XML file that uses the Sitemap protocol. This file lists URLs in your site, along with optional descriptive information about those URLs (such as when they were last updated and how often you modify them). You can create this XML file using our Sitemap Generator or a third-party tool. Google Sitemaps are intended for processing by the Google Sitemaps program.

An HTML sitemap is intended for users of your site. Generally, this type of sitemap provides links to the pages in your site, and may provide descriptions of those pages. We encourage the use of HTML sitemaps. They make it easier for users to navigate your site. Also, as we talk about in our webmaster guidelines, a clear hierarchy with text and links helps us index your site.

You can’t submit an HTML sitemap to the Google Sitemaps program. However, if you are unable to create or generate a Google Sitemap file in the Sitemap protocol format, you can submit a text file that lists URLs in your site.

Permalink

Using OAI-PMH with Google Sitemaps


If your site uses the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) 2.0 protocol, an application-independent interoperability framework based on metadata harvesting, you can use your OAI repository as your Sitemap.

Simply submit the baseURL of your OAI repository (for instance, http://www.example.com/oaiserver). When we query the baseURL, we automatically add query parameters (such as ?verb=Identify or ?verb=ListRecords), so you can simply submit the baseURL itself. When we extract the URLs for your site, we expect the records in the repository to be formatted using Dublin Core, with the URLs embedded in <dc:identifier> tags. Below is a sample record that includes the <dc:identifier> tag in bold. The URL listed in that tag is what we extract.
<oai_dc:dc
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title xml:lang="en">A title of extraordinary things</dc:title>
<dc:creator>McCormack, Michael</dc:creator>
<dc:subject>LCSH:Ausdehnungslehre; LCCN QA205.H99; Greatness:Amanda</dc:subject>
<dc:publisher>J. Wiley & Sons</dc:publisher>
<dc:date>Created: 1906; Available: 1991</dc:date>
<dc:type>text</dc:type>
<dc:identifier>http://example.com/physics/1796949</dc:identifier>
<dc:language>english</dc:language>
<dc:rights xml:lang="en">Public Domain</dc:rights>
</oai_dc:dc>
As with other Sitemaps, the URLs must be within the same site and at the same directory location or lower than the baseURL. For instance, if you submit http://www.example.com/oaiserver as the baseURL, the following URLs would be valid:
http://www.example.com/
http://www.example.com/samples.html
http://www.example.com/images/
However, if you submit http://www.example.com/dataprovider/oaiserver, then none of those URLs would be valid.

Permalink

Combining Sitemaps into one larger Sitemap


Do you have several small Sitemaps that you would like to combine into one larger one? With version 1.3 of the Sitemap Generator, which we told you about yesterday, you can do just that.

This version includes a new input method: sitemap, which lets you point to existing Sitemaps that you created with the Sitemap Generator. The Sitemap Generator will create a single Sitemap that includes the URLs contained in each Sitemap you list.

To use this input method, locate the sitemap section of the config file and modify it as needed. You should include one <sitemap> entry for each Sitemap you want to include. Each entry must contain the path parameter, whose value should be the path and filename of an existing Sitemap file.

The example_config.xml file included with the Sitemap Generator download includes a sample <sitemap> section:

<-- ** MODIFY or DELETE **
"sitemap" nodes tell the script to scan other Sitemap files. This can
be useful to aggregate the results of multiple runs of this script into
a single Sitemap.

Required attributes: path - path to the file -->

<sitemap path="/var/www/docroot/subpath/sitemap.xml">

This section gives one example. You should replace this example and include an entry for each Sitemap you want to include. Ensure that the path value is the complete path and filename on your web server. You can list gzipped Sitemaps as well, as long as they have a .gz extension. Rather than list each Sitemap, you can use wildcards. For instance, the following entry would include any Sitemaps that begin with the word "sitemap" and have an .xml extension:

<sitemap path="/var/www/docroot/subpath/sitemap*.xml">
The Sitemap Generator extracts all URLs and the optional data listed for each URL for every Sitemap you list and creates one Sitemap with this information. At this time, we can't guarantee that this method will work for Sitemaps created with tools other than the Sitemap Generator.

Permalink

Announcing Sitemap Generator version 1.3: Improved encoding support


The Sitemap Generator version 1.3 is now available and provides improved encoding support. If your webserver uses an encoding other than UTF-8 or if your domain name or some the URLs in your site use non-ASCII characters, and you plan to use the Sitemap Generator to create your Sitemap, you should download this latest version.

Generally, non-ASCII URLs should be encoded using UTF-8 before being percent-escaped. However, some webservers respond correctly only if URLs are encoded specifically for the webserver's configuration. All URLs within your Sitemap, as well as the URL of the Sitemap itself, must be encoded for readability by the web server on which they are located.

If you are using the Sitemap Generator, you can specify the encoding of the URLs contained in the Sitemap from within the config.xml file. Within the site definition section of that config file, use the optional default_encoding attribute to specify the encoding used by your webserver. If you don't use this attribute and your webserver uses an encoding other than UTF-8, the Sitemap Generator can't know which encoding to use, although it does attempt to determine the correct encoding. If the generated Sitemap doesn't list the URLs correctly, you should explicitly indicate the encoding with the default_encoding attribute and run the Sitemap Generator again.

If your URLs contain non-ASCII characters, we recommend that you run the Sitemap Generator script using Python 2.3 or higher. This version of Python has increased non-ASCII support. If your domain name contains non-ASCII characters, you must use Python 2.3 or later, as Internationalizing Domain Names in Applications (IDNA) support wasn't added until this version. Without IDNA support, the Sitemap Generator can't correctly encode a non-ASCII domain name.

Permalink

Google Sitemaps in your language


We’ve just made our Sitemaps user interface and documentation available in ten additional languages. We have also set up Google Groups for each one. The languages available are:

Brazilian Portuguese
Dutch
French
German
Italian
Korean
Russian
Simplified Chinese
Spanish
Traditional Chinese
UK English
US English

If you already use Google in one of these languages, you should see the change automatically. Otherwise, you can click the Preferences link from the Google home page and choose one of these languages from the interface list.

As always, you can submit a Sitemap for sites with content in any language.

Permalink

Verifying your site: trouble with 404 pages


You want to verify your site so you can view additional statistics. You click the verify link beside the site on the My Sitemaps page, create the file we ask for, upload it to your server, and click the Check Status button. And then you see this error message:

We've detected that your 404 (file not found) error page returns a status of 200 (OK) in the header.

What should you do?

This error means that we'’ve detected that your server returns a status of OK when the requested file is not found. This is the same status that the server returns when the file exists. When we look for the verification file, we can't tell if your server is returning a status of OK because it finds the file, or because it can'’t find the file. This means we are unable to verify your site.

Modify your web server configuration to return a status of 404 (file not found) in the header of 404 pages. If your site is hosted, ask your hosting company to do this.

Make sure that if your server returns a custom error page when a requested file is not found, that page returns a 404 status in the header. And make sure that the server doesn't redirect requests that return "file not found" to a valid page of your site, such as your home page. This configuration returns a redirect status code (such as 301 or 302) rather than the correct 404 status code.

You can read more about http status codes here. If you don'’t have a mechanism for checking the headers that your server returns, you can do a search for terms such as [check server header tool] to find online tools that will check this for you.

Once your web server is configured correctly, try to verify your site again and we'’ll check the configuration.

Permalink

Submitting mobile Sitemaps


If you've created and submitted Sitemaps for your non-mobile pages, or just want to submit a mobile Sitemap for the first time, here are a few helpful tips to help you get started:

Identify your mobile Sitemaps content
  • If you have content other than XHTML mobile profile, WML, and cHTML, you should create a separate non-mobile Sitemap for those and submit it using the Add a Sitemap page.

  • If your site serves multiple mobile markup languages, you should create a separate mobile Sitemap for each markup language.

  • If you have URLs that serve multiple markup languages, you should include those URLs in each mobile Sitemap that applies. For instance, if you have a URL that serves both XHTML and WML content, include that URL in two mobile Sitemaps, one for each markup language.

  • If you have a large site and use Sitemap index files to manage a large number of mobile Sitemaps, make sure that each Sitemap index file only includes mobile Sitemaps for one markup language.
Create your mobile Sitemaps
  • You can create mobile Sitemaps in the same way as other Sitemaps. The only format not supported is OAI-PMH.

  • If you are submitting a syndication feed, and the URLs listed in that feed serve multiple markup languages, decide which markup language fits best. You can’t submit the syndication feed multiple times, each for a different markup language, since each Sitemap (within the same directory) must have a unique name.
Submit your mobile Sitemaps
  • Once you’ve created your Sitemap, log in to your Google Account and submit it using the Add a Mobile Sitemap page.

  • Specify the markup language that the URLs listed in the mobile Sitemap serve.

  • Once you’ve submitted the mobile Sitemap it will be listed on your My Sitemaps page as a “Mobile” type.

Permalink

Mobile pages and new statistics


We just launched two new features.

Mobile Sitemaps
You can already use Google Mobile Web Search on your mobile phone to search through sites that have been specifically designed for mobile phones, PDAs, and other handheld devices. We add new sites to our mobile web index regularly.

You can help users find your mobile webpages by letting us know about those pages. Google Mobile Sitemaps lets you submit Sitemaps for URLs that serve mobile content. You create and submit a mobile Sitemap much in the same way you do other Sitemaps: with the Sitemap Generator, the Sitemap protocol, or via a syndication feed or text file.

The biggest difference with a mobile Sitemap is that you have to submit a separate one for each markup language. Right now we support:

  • XHTML mobile profile (WAP 2.0)

  • WML (WAP 1.2)

  • cHTML (iMode)

When you submit your mobile Sitemap, let us know which markup language the URLs in the Sitemap serve.

New Site Statistics
Now you can see information for the URLs in your site that we've had trouble accessing ‚– both for URLs from your Sitemap and those we've discovered during a regular crawl. We won't show you these additional statistics until you verify your site, which is a very simple process. Click the verify link next to the site on your My Sitemap page, create an empty file using the name we specify, and upload it to the folder where your Sitemap is located. We'll check to see that the file is there, which tells us that you have permission to upload files to that site, and then we'll show you the information.

Permalink

What URLs should a Sitemap include?


So, what URLs from your site should you put in your Sitemap?

Put in everything! List the URLs that contain your content, images, media, and anything else in your site.

If you want to include only a subset of items, you can, but we’d like as much information about your site as you can give us.

Remember that we respect robots.txt, so if you include any URLs in your Sitemap that are restricted in robots.txt, we won’t crawl those.

Permalink

What's in a name?


How should you name your Sitemap? What extension should you give it?

The short answer is that you can name your Sitemap anything you want. You can use any extension. Just submit the URL to us, and we’ll go pick it up.

The better answer is a little longer. We recommend that you give your Sitemap an extension that identifies the file type. For instance, if you create a simple text file that lists URLs, we suggest giving that Sitemap a .txt extension.

If you create an XML Sitemap that uses our Sitemap protocol, give it an .xml extension. If you compress that file using gzip, give it an .xml.gz extension.

If you use our Sitemap Generator to create a Sitemap, you specify the resulting Sitemap name in the config.xml file. The default name is sitemap.xml.gz. If you keep the .gz extension, the resulting Sitemap file will be compressed. If you change this name to have an .xml extension, the resulting file will not be compressed. We suggest you compress the file so that your webserver will take less of a bandwidth hit when we download it.

You can submit the URL of a script that dynamically generates an XML Sitemap when we download it. That script might have an extension such as .asp or .php (depending on the script type). The extension of the file isn’t a problem, but if your script takes a long time to run, the delay will look like a server timeout and we’ll try again later. If you have trouble getting this type of Sitemap submitted, make sure your script is responsive. Also ensure that your webserver doesn’t automatically add things (such as HTML headers and footers) to the generated files, since that would cause the resulting XML file to have parsing errors.

One more thing about naming. You can name your Sitemap anything you want… almost. You can’t name it robots.txt. And if you use a robots.txt file for your site, make sure that it doesn’t restrict our access to your Sitemap file.

Permalink

Using Sitemap Index Files


Several of you have asked us, Should I submit Sitemaps or Sitemap indexes for my site?

If you have a small site, you probably don't need to use a Sitemap index file -- you can just list all of your URLs in one Sitemap.

If you have a larger site, you may want or need to have multiple Sitemaps for your site. In that case, you can make submitting and tracking easier by listing the Sitemaps in a Sitemap index file.

You must use multiple Sitemaps for your site when:
  • You have more than 50,000 URLs to list. That's the maximum that one Sitemap can include.

  • A single Sitemap would be larger than 10MB, uncompressed. That's the maximum size for a Sitemap.
You may want to use multiple Sitemaps for other reasons, to make organization easier. For instance if:
  • You want to store all of your archived URLs in one Sitemap and all of your URLs that change frequently in another. This way, when you add new URLs to the Sitemap, you'll have a smaller and more manageable file to work with.

  • You manage multiple sites that are in subfolders. You might want to create a Sitemap for each site and then create a Sitemap index file that lists them in the root. Remember, this method works only with subfolders. For example, this Sitemap index file:
    www.example.com/sitemap_index.xml
    could list the following Sitemaps:
    www.example.com/site1/sitemap.xml
    www.example.com/sitemap2/sitemap.xml
    However, that Sitemap index file could not list the following Sitemaps:
    site1.example.com/sitemap.xml
    site2.example.com/sitemap.xml
Creating an index of Sitemap index files
You can also have an index of Sitemap index files. A Sitemap index file can be a maximum of 10MB as well, so if you have a really large site, you may have to use this additional organization step to keep the file sizes to a manageable level. We have a size limitation for Sitemaps and Sitemap indexes so that when we download the files, we don't overwhelm your bandwidth.

Compressing your Sitemap index file
Speaking of being considerate of your bandwidth, if you can, you should compress your Sitemaps and your Sitemap indexes using gzip. If you're not familiar with gzip, keep watching this blog. We're putting together some helpful instructions.

If you compress your Sitemap index file, you'll probably want to give it an .xml.gz extension. If you don't compress your Sitemap index file, you'll probably want to give it an .xml extension.

Submitting your Sitemap index file
So, you've got some individual Sitemaps that you've listed in a Sitemap index file. What now? Just Sign into Google Sitemaps and submit the Sitemap index file. You don't need to submit individual Sitemaps that are included in the index. Once we've processed your Sitemap index file, we'll let you know if we found errors in the Sitemap index itself, or in any of the individual Sitemaps.

If you make changes to a Sitemap included in a Sitemap index file you've submitted, just change the lastmod date for that Sitemap in your index. During this beta period, feel free to resubmit the Sitemap index file.

Permalink

Just getting started...


We are stoked that so many of you are trying out Sitemaps while it's in beta. When we read on the Sitemaps Google Group that you'd like an easy way to find out about new features and where we're headed with this thing, we realized we could put our very own Blogger to good use.

And since the Sitemaps team is far-flung in Mountain View, Kirkland, and Zurich, this will be a good way to keep everyone posted about new features and developments. Subscribe to our feed to keep up with the latest. We'll also e-mail each blog post to the Sitemaps Google Group, so if you get your news from there, you won't miss out on a thing.

In addition to telling you about new features here, we'll also do our best to address some of the frequently asked questions from the Google Group.

Lately, some of you have wondered if submitting a Sitemap could actually reduce the number of your site pages we have indexed.

No, submitting a Sitemap will not reduce the number of indexed pages for your site.

As we note in our information for Webmasters, each time we update our database of webpages, our index shifts: we find new sites, we lose some sites, and sites' rankings change. If your site was dropped from Google and you haven't made major changes to it, we'll likely pick it up again soon. If your site's ranking changes, ensure you are following our guidelines.

When you submit your Sitemap, you help us learn more about the contents of your site. Participation in this program will not affect your pages' rankings or cause your pages to be removed from our index.

Permalink



Copyright © 2005 Google Inc. All rights reserved.
Privacy Policy - Terms of Service