Official Google Webmaster Central Blog

Case Studies: Fixing Hacked Sites

Wednesday, February 18, 2015

Webmaster Level: All

Every day, thousands of websites get hacked. Hacked sites can harm users by serving malicious software, collecting personal information, or redirecting them to sites they didn't intend to visit. Webmasters want to fix hacked sites quickly, but unfortunately recovering from a hack can be a complicated process.

We're trying to make the process of recovering from a hack easier for webmasters with features like Security Issues, Help for Hacked Sites, and a section of our forum just for hacked sites. Recently we talked to two webmasters with hacked sites to learn more about how they were able to fix their sites. We're sharing their stories with the hope that they might provide ideas to other webmasters who have been victims of hacking. We're also using these stories and other feedback for improving our documentation for hacked sites to make the process easier for everyone going forward.

Case Study #1: Restaurant website with multiple hack-injected scripts

A restaurant website using Wordpress received a message from Google in their Webmaster Tools account, alerting them that their site had been altered by hackers. To protect Google users, the website was labelled as hacked in Google's search results. The webmaster of the site, Sam, looked at the source code and noticed many unfamiliar links on the site with pharmaceuticals terms such as "viagra" and "cialis." She also noticed many pages where the meta description tags (in the HTML) had added content such as "buy valtrex in florida." There were also hidden div tags (also in the HTML) of many pages that linked to many sites. None of these links were added by Sam.

Sam removed all of the hacked content she found and filed a reconsideration request. The request was rejected but in the message she received from Google, she was advised to check for any unfamiliar scripts in the any PHP files (or any other server files), as well as changes to the .htaccess file. These files are likely to have scripts added by the hackers that modify the site. These scripts typically only show the hacked content to search engines, while hiding the content from a normal user. Sam checked out all of the .php files and compared them to the clean copies she had in her backup. She found new content added to her footer.php, index.php, and functions.php. When she replaced those files with the clean backups, she could no longer find any hacked content on her site. When she filed another reconsideration request, she got a response from Google notifying her that her site was free from hacked content!

Even though Sam had cleaned up the hacked content on her site, she knew that she would need to continue to secure her site against future attacks. She followed the steps below to keep her site safe in the future:

Keep the CMS (content management system like WordPress, Joomla, Drupal, etc) up to date with the most current version. Make sure plugins are up to date as well.
Make sure the account used to access the administrative features of the CMS uses a difficult and unique password.
If the CMS supports it, enable 2-step verification for login. (This might also be called two factor authentication or two step authentication.) This is recommended for the account being used for password recovery as well. Most email providers, like Google, Microsoft, Yahoo all support this!
Make sure the plugins and themes installed are from a reputable source - pirated plugins or themes can often contain code that makes it even easier for hackers to get in!

Case Study #2: Professional website with lots of hard to find hacked pages

A small business owner named Maria who also manages her own website received a message in her Webmaster Tools that her site was hacked. The message provided an example of a page added by hackers: http://example.com/where-to-buy-cialis-over-the-counter/. She talked to her hosting provider who looked at the source code on the homepage but could not find any pharmaceutical keywords. When the hosting provider visited http://example.com/where-to-buy-cialis-over-the-counter/, it returned an error page. Maria also bought a malware scanning service but the service was not able to find any malicious content on her site.

Maria then went to Webmaster Tools and used the Fetch as Google tool on the example URL Google had provided (http://example.com/where-to-buy-cialis-over-the-counter/) which returned no content. Confused, she filed a reconsideration request and received a rejection message which advised her to do two things:

Verify the non-www version of her site as hackers often try to hide content in folders that may be overlooked by the webmaster.
While it may seem like http://example.com and http://www.example.com are the same site, Google actually treats these as different sites. http://example.com is referred to as the "root domain" while http://www.example.com is called the subdomain. Maria had http://www.example.com verified but not http://example.com verified which is important because the pages added by hackers were non-www pages like http://example.com/where-to-buy-cialis-over-the-counter/. Once she verified http://example.com she was able to successfully see the hacked content on the provided URL with the Fetch as Google tool in Webmaster Tools.
Check her .htaccess file for new rules.
Maria talked to her hosting provider who showed her how to access her .htaccess file. She noticed right away that her .htaccess file had some strange content that she had not added:

<IfModule mod_rewrite.c> RewriteEngine On RewriteCond %{HTTP_USER_AGENT} (google|yahoo|msn|aol|bing) [OR] RewriteCond %{HTTP_REFERER} (google|yahoo|msn|aol|bing) RewriteRule ^([^/]*)/$ /main.php?p=$1 [L] </IfModule>

The mod_rewrite rule you see above was inserted by the hacker and redirects anyone coming from certain search engines, as well as search engine crawlers, to main.php, which generates all of the hacked content. It's also possible that these rules can redirect users accessing the site on a mobile device. On the same day, she also saw that a recent malware scan found suspicious content on the main.php file. One top of that, she also noticed an unknown user in the ftp users area of her website development software.

She removed the main.php file, the .htaccess file, and removed the unknown user from her FTP users area and her site was no longer hacked!

Steps to prevent getting hacked in the future

Avoid using FTP when transferring files to your servers. FTP does not encrypt any traffic, including passwords. Instead, use SFTP, which will encrypt everything, including your password, as a protection against eavesdroppers examining network traffic.
Check the permissions on sensitive files like .htaccess. Your hosting provider may be able to assist you if you need help. The .htaccess file can be used to improve and protect your site, but it can also be used for malicious hacks if they are able to gain access to it.
Be vigilant and look for new and unfamiliar users in your administrative panel and any other place where there may be users that can modify your site.

We hope your site never gets hacked, but if it does, we have many resources for hacked webmasters on our Help for Hacked Sites page. If you need more help or would like to share your own tips, you can post in our Webmaster Help Forum. If you do post to the forum or submit a reconsideration request for your site, please include #NoHacked.

Posted by Julian Prentice and Yuan Niu, Search Quality Team

Crawling and indexing of locale-adaptive pages

Wednesday, January 28, 2015

Webmaster level: advanced

Locale-adaptive pages change their content to reflect the user's language or perceived geographic location. Since, by default, Googlebot requests pages without setting an Accept-Language HTTP request header and uses IP addresses that appear to be located in the USA, not all content variants of locale-adaptive pages may be indexed completely.

Today we’re introducing new locale-aware crawl configurations for Googlebot for pages that we detect may adapt the content they serve based on the request's language and perceived location. These are:

Geo-distributed crawling where Googlebot would start to use IP addresses that appear to be coming from outside the USA, in addition to the current IP addresses that appear to be from the USA that Googlebot currently uses.
Language-dependent crawling where Googlebot would start to crawl with an Accept-Language HTTP header in the request.

As these new crawling configurations are enabled automatically for pages we detect to be locale-adaptive, you may notice changes in how we crawl and show your site in Google search results without you altering your CMS or server settings.

Note that these new configurations do not alter our recommendation to use separate URLs with rel=alternate hreflang annotations for each locale. We continue to support and recommend using separate URLs as they are still the best way for users to interact and share your content, and also to maximize indexing and better ranking of all variants of your content.

As always, if you have any questions or feedback, please tell us in the internationalization Webmaster Help Forum.

Posted by Qin Yin, Software Engineer Search Infrastructure, and Pierre Far, Webmaster Trends Analyst

Upcoming Events In The Knowledge Graph

Thursday, January 15, 2015

Webmaster level: all

Last year, we launched a new way for musical artists to list their upcoming events on Google: schema.org markup on their official websites. Now we’re expanding this program in four ways:

1. Official Ticket Links For artists: if you mark up ticketing links along with the events on your official website, we’ll show an expanded answer card for your events in Google search, including the on-sale date, availability, and a direct link to your preferred ticketing site.

As before, you may write the event markup directly into your site’s HTML, or simply install an event widget that builds in the markup for you automatically—like Bandsintown, BandPage, GigPress, ReverbNation or Songkick.

2. Delegated Event Listings What if you can’t add markup or an event widget to your official website—for example, if your website doesn’t list your events at all? Now you can use delegation markup to tell us to source your events from a page of your choice on another website. Just add the following markup to your home page, making sure to customize the three red values:

<script type="application/ld+json">
{"@context" : "http://schema.org", 
 "@type" : "MusicGroup", 
 "name" : "Your Band or Performer Name", 
 "url" : "http://your-official-website.com", 
 "event" : "http://other-event-site.com/your-event-listing-page/"
}
</script>

The marked-up events found on the other event site's page will then be eligible for Google events features. Examples of sites you can point to in the “event” field include bandpage.com, bandsintown.com, songkick.com, and ticketmaster.com.

3. Comedian Events Hey funny people! We want your performances to show up on Google, too. Just add ComedyEvent markup to your official website. Or, if another site like laughstub.com has your complete event listings, use delegation markup on your home page to point us their way.

4. Venue Events Last but definitely not least: we’re starting to show venue event listings in Google Search. Concert venues, theaters, libraries, fairgrounds, and so on: make your upcoming events eligible for display across Google by adding Event markup to your official website.

As with artist events, you have a choice of writing the event markup directly into your site’s HTML, or using a widget or plugin that builds in the markup for you. Also, if all your events are ticketed by a primary ticketer whose website provides markup, you don’t have to do anything! Google will read the ticketer’s markup and apply it toward your venue’s event listings.

For example, venues ticketed by Ticketmaster, including its international sites and TicketWeb, will automatically be covered. The same goes for venues that list events with Ticketfly, AXS, LaughStub, Wantickets, Holdmyticket, ShowClix, Stranger Tickets, Ticket Alternative, Digitick, See Tickets, Tix, Fnac Spectacles, Ticketland.ru, iTickets, MIDWESTIX, Ticketleap, or Instantseats. All of these have already implemented ticketer events markup.

Please see our Developer Site for full documentation of these features, including a video tutorial on how to write and test event markup. Then add the markup, help new fans discover your events, and play to a packed house!

Posted by Justin Boyan, Product Manager, Google Search

New Structured Data Testing Tool, documentation, and more

Thursday, January 15, 2015

Webmaster level: all

Structured data markup helps your content get discovered in search results and across Google properties. We’re excited to share several updates to help you author and publish markup on your website:

A new Structured Data Testing Tool to better reflect Google’s interpretation of your content
Improved documentation and policy guidelines for Google features powered by structured data on the web
Expanded support for the JSON-LD markup syntax

Structured Data Testing Tool

The new Structured Data Testing Tool better reflects how Google interprets a web page’s structured data markup.

It provides the following features:

Validation for all Google features powered by structured data
Support for markup in the JSON-LD syntax, including in dynamic HTML pages
Clean display of the structured data items on your page
Syntax highlighting of markup problems right in your HTML source code

New documentation and simpler policy We've clarified our documentation for the vocabulary supported in structured data based on webmasters' feedback. The new documentation explains the markup you need to add to enable different search features for your content, along with code examples in the supported syntaxes. We'll be retiring the old documentation soon.

We've also simplified and clarified our policies on using structured data. If you believe that another site is abusing Google's rich snippets quality guidelines, please let us know using the rich snippets spam report form.

Expanded support for JSON-LD We've extended our support for schema.org vocabulary in JSON-LD syntax to new use cases: company logos and contacts, social profile links, events in the Knowledge Graph, the sitelinks search box, and event rich snippets. We're working on expanding support to additional markup-powered features in the future.

As always, we welcome your feedback and questions; please post in our Webmaster Help forums.

Posted by Pierre Far, Webmaster Trends Analyst, Tatsiana Sakhar, Search Quality Analyst, Zach Clifford, Software Engineer

Google Public DNS and Location-Sensitive DNS Responses

Monday, December 15, 2014

Webmaster level: advanced

Recently the Google Public DNS team, in collaboration with Akamai, reached an important milestone: Google Public DNS now propagates client location information to Akamai nameservers. This effort significantly improves the accuracy of approximately 30% of the location-sensitive DNS responses returned by Google Public DNS. In other words, client requests to Akamai hosted content can be routed to closer servers with lower latency and greater data transfer throughput. Overall, Google Public DNS resolvers serve 400 billion responses per day and more than 50% of them are location-sensitive.

DNS is often used by Content Distribution Networks (CDNs) such as Akamai to achieve location-based load balancing by constructing responses based on clients’ IP addresses. However, CDNs usually see the DNS resolvers’ IP address instead of the actual clients’ and are therefore forced to assume that the resolvers are close to the clients. Unfortunately, the assumption is not always true. Many resolvers, especially those open to the Internet at large, are not deployed at every single local network.

To solve this issue, a group of DNS and content providers, including Google, proposed an approach to allow resolvers to forward the client’s subnet to CDN nameservers in an extension field in the DNS request. The subnet is a portion of the client’s IP address, truncated to preserve privacy. The approach is officially named edns-client-subnet or ECS.

This solution requires that both resolvers and CDNs adopt the new DNS extension. Google Public DNS resolvers automatically probe to discover ECS-aware nameservers and have observed the footprint of ECS support from CDNs expanding steadily over the past years. By now, more than 4000 nameservers from approximately 300 content providers support ECS. The Google-Akamai collaboration marks a significant milestone in our ongoing efforts to ensure DNS contributes to keeping the Internet fast. We encourage more CDNs to join us by supporting the ECS option.

For more information about Google Public DNS, please visit our website. For CDN operators, please also visit “A Faster Internet” for more technical details.

Posted by Yunhong Gu, Tech Lead, Google Public DNS

The four steps to appiness

Tuesday, December 09, 2014

Webmaster Level: intermediate to advanced

App deep links are the new kid on the block in organic search, and they’re picking up speed faster than you can say “schema.org ViewAction”! For signed-in users, 15% of Google searches on Android now return deep links to apps through App Indexing. And over just the past quarter, we've seen the number of clicks on app deep links jump by 10x.

We’ve gotten a lot of feedback from developers and seen a lot of implementations gone right and others that were good learning experiences since we opened up App Indexing back in June. We’d like to share with you four key steps to monitor app performance and drive user engagement:

1. Give your app developer access to Webmaster Tools

App indexing is a team effort between you (as a webmaster) and your app development team. We show information in Webmaster Tools that is key for your app developers to do their job well. Here’s what’s available right now:

Errors in indexed pages within apps
Weekly clicks and impressions from app deep link via Google search
Stats on your sitemap (if that’s how you implemented the app deep links)

...and we plan to add a lot more in the coming months!

We’ve noticed that very few developers have access to Webmaster Tools. So if you want your app development team to get all of the information they need to fix app-related issues, it’s essential for them to have access to Webmaster Tools.

Any verified site owner can add a new user. Pick restricted or full permissions, depending on the level of access you’d like to give:

2. Understand how your app is doing in search results

How are users engaging with your app from search results? We’ve introduced two new ways for you to track performance for your app deep links:

We now send a weekly clicks and impressions update to the Message center in your Webmaster Tools account.
You can now track how much traffic app deep links drive to your app using referrer information - specifically, the referrer extra in the ACTION_VIEW intent. We're working to integrate this information with Google Analytics for even easier access. Learn how to track referrer information on our Developer site.

3. Make sure key app resources can be crawled

Blocked resources are one of the top reasons for the “content mismatch” errors you see in Webmaster Tools’ Crawl Errors report. We need access to all the resources necessary to render your app page. This allows us to assess whether your associated web page has the same content as your app page.

To help you find and fix these issues, we now show you the specific resources we can’t access that are critical for rendering your app page. If you see a content mismatch error for your app, look out for the list of blocked resources in “Step 5” of the details dialog:

4. Watch out for Android App errors

To help you identify errors when indexing your app, we’ll send you messages for all app errors we detect, and will also display most of them in the “Android apps” tab of the Crawl errors report.

In addition to the currently available “Content mismatch” and “Intent URI not supported” error alerts, we’re introducing three new error types:

APK not found: we can’t find the package corresponding to the app.
No first-click free: the link to your app does not lead directly to the content, but requires login to access.
Back button violation: after following the link to your app, the back button did not return to search results.

In our experience, the majority of errors are usually caused by a general setting in your app (e.g. a blocked resource, or a region picker that pops up when the user tries to open the app from search). Taking care of that generally resolves it for all involved URIs.

Good luck in the pursuit of appiness! As always, if you have questions, feel free to drop by our Webmaster help forum.

Posted by Mariya Moeva, Webmaster Trends Analyst

Are you a robot? Introducing “No CAPTCHA reCAPTCHA”

Wednesday, December 03, 2014

reCAPTCHA protects the websites you love from spam and abuse. So, when you go online—say, for some last-minute holiday shopping—you won't be competing with robots and abusive scripts to access sites. For years, we’ve prompted users to confirm they aren’t robots by asking them to read distorted text and type it into a box, like this:

But, we figured it would be easier to just directly ask our users whether or not they are robots—so, we did! We’ve begun rolling out a new API that radically simplifies the reCAPTCHA experience. We’re calling it the “No CAPTCHA reCAPTCHA” and this is how it looks:

On websites using this new API, a significant number of users will be able to securely and easily verify they’re human without actually having to solve a CAPTCHA. Instead, with just a single click, they’ll confirm they are not a robot.

A brief history of CAPTCHAs

While the new reCAPTCHA API may sound simple, there is a high degree of sophistication behind that modest checkbox. CAPTCHAs have long relied on the inability of robots to solve distorted text. However, our research recently showed that today’s Artificial Intelligence technology can solve even the most difficult variant of distorted text at 99.8% accuracy. Thus distorted text, on its own, is no longer a dependable test.

To counter this, last year we developed an Advanced Risk Analysis backend for reCAPTCHA that actively considers a user’s entire engagement with the CAPTCHA—before, during, and after—to determine whether that user is a human. This enables us to rely less on typing distorted text and, in turn, offer a better experience for users. We talked about this in our Valentine’s Day post earlier this year.

The new API is the next step in this steady evolution. Now, humans can just check the box and in most cases, they’re through the challenge.

Are you sure you’re not a robot?

However, CAPTCHAs aren't going away just yet. In cases when the risk analysis engine can't confidently predict whether a user is a human or an abusive agent, it will prompt a CAPTCHA to elicit more cues, increasing the number of security checkpoints to confirm the user is valid.

Making reCAPTCHAs mobile-friendly

This new API also lets us experiment with new types of challenges that are easier for us humans to use, particularly on mobile devices. In the example below, you can see a CAPTCHA based on a classic Computer Vision problem of image labeling. In this version of the CAPTCHA challenge, you’re asked to select all of the images that correspond with the clue. It's much easier to tap photos of cats or turkeys than to tediously type a line of distorted text on your phone.

Adopting the new API on your site

As more websites adopt the new API, more people will see "No CAPTCHA reCAPTCHAs". Early adopters, like Snapchat, WordPress, Humble Bundle, and several others are already seeing great results with this new API. For example, in the last week, more than 60% of WordPress’ traffic and more than 80% of Humble Bundle’s traffic on reCAPTCHA encountered the No CAPTCHA experience—users got to these sites faster. To adopt the new reCAPTCHA for your website, visit our site to learn more.

Humans, we'll continue our work to keep the Internet safe and easy to use. Abusive bots and scripts, it’ll only get worse—sorry we’re (still) not sorry.

Posted by Vinay Shet, Product Manager, reCAPTCHA

Webmaster Central Blog