Official Google Webmaster Central Blog

Tips for making information universally accessible

Saturday, March 15, 2008

Written by T.V. Raman, Research Scientist

Many people talk about the effect the Internet has on democratizing access to information, but as someone who has been visually impaired since my teenage years, I can certainly speak to the profound impact it has had on my life.

In everyday life, things like a sheet of paper—and anything written on it—are completely inaccessible to a blind or visually impaired user. But with the Internet a new world has opened up for me and so many others. Thanks to modern technology like screen readers, web pages, books, and web applications are now at our fingertips.

In order to help the visually impaired find the most relevant, useful information on the web, and as quickly as possible, we developed Accessible Search. Google Accessible Search identifies and prioritizes search results that are more easily used by blind and visually impaired users – that means pages that are clean and simple (think of the Google homepage!) and that can load without images.

Why should you take the time to make your site more accessible? In addition to the service you'll be doing for the visually-impaired community, accessible sites are more easily crawled, which is a first step in your site's ability to appear in search results.

So what can you do to make your sites more accessible? Well first of all, think simple. In its current version, Google Accessible Search looks at a number of signals by examining the HTML markup found on a web page. It tends to favor pages that degrade gracefully: pages with few visual distractions and that are likely to render well with images turned off. Flashing banners and dancing animals are probably the worst thing you could put on your site if you want its content to be read by an adaptive technology like a screen reader.

Here are some basic tips:

Keep web pages easy to read, avoiding visual clutter and ensuring that the primary purpose of the web page is immediately accessible with full keyboard navigation.

There are many organizations and online resources that offer website owners and authors guidance on how to make websites and pages more accessible for the blind and visually impaired. The W3C publishes numerous guidelines including Web Content Access Guidelines that are helpful for website owners and authors.

As with regular search, the best thing you can do with respect to making your site rank highly is to create unique, compelling content. In fact, you can think of the Google crawler as the world's most influential blind user. The content that matters most to the Googlebot is the content that matters most to the blind user: good, quality text.

It's also worth reviewing your content to see how accessible it is for other end users. For example, try browsing your site on a monochrome display or try using your site without a mouse. You may also consider your site's usability through a mobile device like a Blackberry or iPhone.

Fellow webmasters, thanks for taking the time to better understand principles of accessibility. In my next post I'll talk about how to make sure that critical site features, like site navigation, are accessible. Until then!

German Webmaster Blog turns one

Friday, March 14, 2008

Written by Juliane Stiller, Search Quality

Our German Webmaster Central Blog celebrates its first birthday and we'd like to raise our glasses to 57 published posts in the last year! We enjoy looking back at an exciting first year of blogging and communicating with webmasters. It's the growing webmaster community that made this blog a success. Thanks to our readers for providing feedback on our blog posts and posting in the German Webmaster Help group.

Over the past year, we published numerous articles specifically targeted for the German market - topics varying from affiliate programs to code snippets. We also translated many of the applicable English posts for the German blog. If you speak German (Hallo!) come check out the German Webmaster Blog and subscribe to our feed or email alert.

Hope to see you soon,
Juliane Stiller on behalf of the German Webmaster Communication Team

Webmaster Tools keeps your "messages waiting"

Tuesday, March 11, 2008

Written by Jessica and Uli, Search Quality Team

We’re happy to announce that the Message Center supports a new “messages waiting” feature. Previously, it could only store penalty notifications for existing verified site owners (webmasters who had already verified their sites). However, the Message Center now has the ability to keep these waiting for future owners, i.e. those who haven’t previously registered with Google's Webmaster Tools.

Creating a new Webmaster Tools account and verifying your site gives you access to any message from Google concerning violations of our Webmaster Guidelines. Messages sent after the launch of this feature can now be retrieved for one year and will remain in your account until you choose to delete them.

Some questions you might be asking:

Q: What happens to old messages when a site changes ownership?
A: Also in the case of a change of ownership, new verified owners will be able to retrieve a message as noted above.

Q: If a site has more than one verified owner and one of them deletes a message, will it be deleted for all the other site owners as well?
A: No, each owner gets his or her own copy of the message when retrieving the message. Deleting one does not affect any past, current, or future message retrievals.

Just as before, if you've received a message alerting you to Webmaster Guidelines violations, you can make the necessary changes so that your site is in line with our guidelines. Then, sign in to Webmaster Tools and file a reconsideration request.

First date with the Googlebot: Headers and compression

Wednesday, March 05, 2008

Written by Maile Ohye as the website, Jeremy Lilley as the Googlebot

Name/User-Agent: Googlebot
IP Address: Verify it here
Looking For: Websites with unique and compelling content
Major Turn Off: Violations of the Webmaster Guidelines

Googlebot -- what a dreamboat. It's like he knows us <head>, <body>, and soul. He's probably not looking for anything exclusive; he sees billions of other sites (though we share our data with other bots as well :), but tonight we'll really get to know each other as website and crawler.

I know, it's never good to over-analyze a first date. We're going to get to know Googlebot a bit more slowly, in a series of posts:

Our first date (tonight!): Headers Googlebot sends, file formats he "notices," whether it's better to compress data
Judging his response: Response codes (301s, 302s), how he handles redirects and If-Modified-Since
Next steps: Following links, having him crawl faster or slower (so he doesn't come on too strong)

And tonight is just the first date...

***************
Googlebot: ACK
Website: Googlebot, you're here!
Googlebot: I am.

GET / HTTP/1.1
Host: example.com
Connection: Keep-alive
Accept: */*
From: googlebot(at)googlebot.com
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Accept-Encoding: gzip,deflate

Website: Those headers are so flashy! Would you crawl with the same headers if my site were in the U.S., Asia or Europe? Do you ever use different headers?

Googlebot: My headers are typically consistent world-wide. I'm trying to see what a page looks like for the default language and settings for the site. Sometimes the User-Agent is different, for instance AdSense fetches use "Mediapartners-Google":
User-Agent: Mediapartners-Google

Or for image search:
User-Agent: Googlebot-Image/1.0

Wireless fetches often have carrier-specific user agents, whereas Google Reader RSS fetches include extra info such as number of subscribers.

I usually avoid cookies (so no "Cookie:" header) since I don't want the content affected too much by session-specific info. And, if a server uses a session id in a dynamic URL rather than a cookie, I can usually figure this out, so that I don't end up crawling your same page a million times with a million different session ids.

Website: I'm very complex. I have many file types. Your headers say "Accept: */*". Do you index all URLs or are certain file extensions automatically filtered?

Googlebot: That depends on what I'm looking for.

If I'm indexing for regular web search, and I see links to MP3s and videos, I probably won't download those. Similarly, if I see a JPG, I will treat it differently than an HTML or PDF link. For instance, JPG is much less likely to change frequently than HTML, so I will check the JPG for changes less often to save bandwidth. Meanwhile, if I'm looking for links as Google Scholar, I'm going to be far more interested in the PDF article than the JPG file. Downloading doodles (like JPGs) and videos of skateboarding dogs is distracting for a scholar—do you agree?

Website: Yes, they can be distracting. I'm in awe of your dedication. I love doodles (JPGs) and find them hard to resist.

Googlebot: Me, too; I'm not always so scholarly. When I crawl for image search, I'm very interested in JPGs. And for news, I'm mostly looking at HTML and nearby images.

There are also plenty of extensions (exe, dll, zip, dmg...), that tend to be big and less useful for a search engine.

Website: If you saw my URL, http://www.example.com/page1.LOL111, would you (whimper whimper) reject it just because it contains an unknown file extension?

Googlebot: Website, let me give a bit more background. After actually downloading a file, I use the Content-Type header to check whether it really is HTML, an image, text, or something else. If it's a special data type like a PDF file, Word document, or Excel spreadsheet, I'll make sure it's in the valid format and extract the text content. Maybe it has a virus; you never know. If the document or data type is really garbled, there's usually not much to do besides discard the content.

So, if I'm crawling http://www.example.com/page1.LOL111 with an unknown file extension, it's likely that I would start to download it. If I can't figure out the content type from the header, or it's a format that we don't index (e.g. mp3), then it'll be put aside. Otherwise, we proceed indexing the file.

Website: My apologies for scrutinizing your style, Googlebot, but I noticed your Accept-Encoding headers say:
Accept-Encoding: gzip,deflate

Can you explain these headers to me?

Googlebot: Sure. All major search engines and web browsers support gzip compression for content to save bandwidth. Other entries that you might see here include "x-gzip" (the same as "gzip"), "deflate" (which we also support), and "identity" (none).

Website: Can you talk more about file compression and "Accept-Encoding: gzip,deflate"? Many of my URLs consist of big Flash files and stunning images, not just HTML. Would it help you to crawl faster if I compressed my larger files?

Googlebot: There's not a simple answer to this question. First of all, many file formats, such as swf (Flash), jpg, png, gif, and pdf are already compressed (there are also specialized Flash optimizers).

Website: Perhaps I've been compressing my Flash files and I didn't even know? I'm obviously very efficient.

Googlebot: Both Apache and IIS have options to enable gzip and deflate compression, though there's a CPU cost involved for the bandwidth saved. Typically, it's only enabled for easily compressible text HTML/CSS/PHP content. And it only gets used if the user's browser or I (a search engine crawler) allow it. Personally, I prefer "gzip" over "deflate". Gzip is a slightly more robust encoding — there is consistently a checksum and a full header, giving me less guess-work than with deflate. Otherwise they're very similar compression algorithms.

If you have some spare CPU on your servers, it might be worth experimenting with compression (links: Apache, IIS). But, if you're serving dynamic content and your servers are already heavily CPU loaded, you might want to hold off.

Website: Great information. I'm really glad you came tonight — thank goodness my robots.txt allowed it. That file can be like an over-protective parent!

Googlebot: Ah yes; meeting the parents, the robots.txt. I've met plenty of crazy ones. Some are really just HTML error pages rather than valid robots.txt. Some have infinite redirects all over the place, maybe to totally unrelated sites, while others are just huge and have thousands of different URLs listed individually. Here's one unfortunate pattern. The site is normally eager for me to crawl:
  User-Agent: *
  Allow: /

Then, during a peak time with high user traffic, the site switches the robots.txt to something restrictive:
  # Can you go away for a while? I'll let you back
  # again in the future. Really, I promise!
  User-Agent: *
  Disallow: /

The problem with the above robots.txt file-swapping is that once I see the restrictive robots.txt, I may have to start throwing away content I've already crawled in the index. And then I have to recrawl a lot of content once I'm allowed to hit the site again. At least a 503 response code would've been temporary.

I typically only re-check robots.txt once a day (otherwise on many virtual hosting sites, I'd be spending a large fraction of my fetches just getting robots.txt, and no date wants to "meet the parents" that often). For webmasters, trying to control crawl rate through robots.txt swapping usually backfires. It's better to set the rate to "slower" in Webmaster Tools.

Googlebot: Website, thanks for all of your questions, you've been wonderful, but I'm going to have to say "FIN, my love."

Website: Oh, Googlebot... ACK/FIN. :)

***************

iGoogle Gadgets for Webmaster Tools

Thursday, February 28, 2008

by Jonathan Simon, Webmaster Tools Team

Update: The described feature is no longer available.

When you plan to do something, are you a minimalist, or are you prepared for every potential scenario? For example, would you hike out into the Alaskan wilderness during inclement weather with only a wool overcoat and a sandwich in your pocket - like the naturalist John Muir (and you thought Steve McQueen was tough)?

Or are you more the type of person where even on a day hike, you bring a few changes of clothes, 3 dehydrated meals, a couple of kitchen appliances, a power inverter, and a foot- powered generator, because, well, you never know when the urge will arise to make toast?

The Webmaster Tools team strives to serve all types of webmasters, from the minimalist to those who use every tool they can find. If you're reading this blog, you've probably had the opportunity to use the current version of Webmaster Tools, which offers as many features as possible just shy of the kitchen sink. Now there's something for those of you who would prefer to access only the features of Webmaster Tools that you need: we've just released Webmaster Tools Gadgets for iGoogle.

Here's the simple process to start using these Gadgets right away. (Note: this assumes you've already got a Webmaster Tools account and have verified at least one site.)

1. Visit Webmaster Tools and select any site that you've validated from the dashboard.
2. Click on the Tools section.
3. Click on Gadgets sub-section.
4. Click on the big "Add an iGoogle Webmaster Tools homepage" button.
5. Click the "Add to Google" button on the following confirm page to add the new tab to iGoogle.
6. Now you're in iGoogle, where you should see your new Google Webmaster Tools tab with a number of Gadgets. Enjoy!

You'll notice that each Gadget has a drop down menu at the top which lets you select from all the sites you have validated to see that Gadget's information for the particular site you select. A few of the Gadgets that we're currently offering are:

Crawl errors - Does Googlebot encounter issues when crawling your site?

Top search queries - What are people searching for to find your site?

External links - What websites are linking to yours?

We plan to add more Gadgets in the future and improve their quality, so if there's a feature that you'd really like to see which is not included in one of the Gadgets currently available, let us know. As you can see, it's a cinch to get started.

It looks like rain clouds are forming over here in Seattle, so I'm off for a hike.

Cross-submissions via robots.txt on Sitemaps.org

Wednesday, February 27, 2008

Posted by Prashanth Koppula, Product Manager

Last spring, the Sitemaps protocol was expanded to include the autodiscovery of Sitemaps using robots.txt to let us and other search engines supporting the protocol know about your Sitemaps. We subsequently also announced support for Sitemap cross-submissions using Google Webmaster Tools, making it possible to submit Sitemaps for multiple hosts on a single dedicated host. So it was only time before we took the next logical step of marrying the two and allowing Sitemap cross-submissions using robots.txt. And today we're doing just that.

We're making it easier for webmasters to place Sitemaps for multiple hosts on a single host and then letting us know by including the location of these Sitemaps in the appropriate robots.txt.

How would this work? Say for example you want to submit a Sitemap for each of the two hosts you own, www.example.com and host2.google.com. For simplicity's sake, you may want to host the Sitemaps on one of the hosts, www.example.com. For example, if you have a Content Management System (CMS), it might be easier for you to change your robots.txt files than to change content in a directory.

You can now exercise the cross-submission support via robots.txt (by letting us know the location of the Sitemaps):

a) The robots.txt for www.example.com would include:
Sitemap: http://www.example.com/sitemap-www-example.xml

b) And similarly, the robots.txt for host2.google.com would include:
Sitemap: http://www.example.com/sitemap-host2-google.xml

By indicating in each individual host's robots.txt file where that host's Sitemap lives you are in essence proving that you own the host for which you are specifying the Sitemap. And by choosing to host all of the Sitemaps on a single host, it becomes simpler to manage your Sitemaps.

We are making this announcement today on Sitemaps.org as a joint effort. To see what our colleagues have to say, you can also check out the blog posts published by Yahoo! and Microsoft.

Leap day hackathon for Google Gadgets, Maps, and more

Tuesday, February 26, 2008

Written by Ben Lisbakken, Google Developer Programs

If you've got JavaScript skills and you'd like to implement such things as Google Gadgets or Maps on your site, bring your laptops and come hang out with us in Mountain View.

This Friday, my team (Google Developer Programs) is hosting a hackathon to get you started with our JavaScript APIs. There will be plenty of our engineers around to answer questions. We'll start with short introductions of the APIs and then break into groups for coding and camaraderie. There'll be food, and prizes too.

The featured JavaScript APIs:

Google Gears -- make your web applications work offline
Google AJAX Search/Feeds -- add Google search capabilities to your website
Google Gadgets -- create small programs that run on your iGoogle homepage
Google Maps -- embed and manipulate Google Maps on your web page
Google Calendar -- interact with Google Calendars on your website

When: Friday, February 29 - two sessions (you're welcome to attend both)

2-5:30 PM
6-10 PM

Where: The Googleplex
Building 40
1600 Amphitheatre Pkwy
Mountain View, CA 94043
Room: Seville Tech Talk, 2nd floor

See our map for parking locations and where to check in. (Soon, you too, will be making maps like this! :)

Just say yes and RSVP!

And no worries if you're busy this Friday; future hackathons will feature other APIs and more languages. Check out the Developer Events Calendar for future listings. Hope to see you soon.

Webmaster Central Blog

Tips for making information universally accessible

German Webmaster Blog turns one

Webmaster Tools keeps your "messages waiting"

First date with the Googlebot: Headers and compression

iGoogle Gadgets for Webmaster Tools

Cross-submissions via robots.txt on Sitemaps.org

Leap day hackathon for Google Gadgets, Maps, and more

Labels

Archive

Feed

Subscribe via email