[go: nahoru, domu]

Many frontend frameworks rely on JavaScript to show content. This can mean Google might take some time to index your content or update the indexed content.
A workaround we discussed at Google I/O this year is dynamic rendering. There are many ways to implement this. This blog post shows an example implementation of dynamic rendering using Rendertron, which is an open source solution based on headless Chromium.

Which sites should consider dynamic rendering?

Not all search engines or social media bots visiting your website can run JavaScript. Googlebot might take time to run your JavaScript and has some limitations, for example.
Dynamic rendering is useful for content that changes often and needs JavaScript to display. Your site's user experience (especially the time to first meaningful paint) may benefit from considering hybrid rendering (for example, Angular Universal).

How does dynamic rendering work?

Dynamic rendering means switching between client-side rendered and pre-rendered content for specific user agents.
You will need a renderer to execute the JavaScript and produce static HTML. Rendertron is an open source project that uses headless Chromium to render. Single Page Apps often load data in the background or defer work to render their content. Rendertron has mechanisms to determine when a website has completed rendering. It waits until all network requests have finished and there is no outstanding work.

This post covers:
  1. Take a look at a sample web app
  2. Set up a small express.js server to serve the web app
  3. Install and configure Rendertron as a middleware for dynamic rendering

The sample web app

The “kitten corner” web app uses JavaScript to load a variety of cat images from an API and displays them in a grid.
Cute cat images in a grid and a button to show more - this web app truly has it all!
Here is the JavaScript:


  
const apiUrl = 'https://api.thecatapi.com/v1/images/search?limit=50';

  const tpl = document.querySelector('template').content;
  const container = document.querySelector('ul');

  function init () {
    fetch(apiUrl)
    .then(response => response.json())
    .then(cats => {
      container.innerHTML = '';
      cats
        .map(cat => {
          const li = document.importNode(tpl, true);
          li.querySelector('img').src = cat.url;
          return li;
        }).forEach(li => container.appendChild(li));
    })
  }

  init();

  document.querySelector('button').addEventListener('click', init);

The web app uses modern JavaScript (ES6), which isn't supported in Googlebot yet. We can use the mobile-friendly test to check if Googlebot can see the content:
The mobile-friendly test shows that the page is mobile-friendly, but the screenshot is missing all the cats! The headline and button appear but none of the cat pictures are there.
While this problem is simple to fix, it's a good exercise to learn how to setup dynamic rendering. Dynamic rendering will allow Googlebot to see the cat pictures without changes to the web app code.
Set up the server
To serve the web application, let's use express, a node.js library, to build web servers.
The server code looks like this (find the full project source code here):

const express = require('express');

const app = express();

const DIST_FOLDER = process.cwd() + '/docs';
const PORT = process.env.PORT || 8080;

// Serve static assets (images, css, etc.)
app.get('*.*', express.static(DIST_FOLDER));

// Point all other URLs to index.html for our single page app
app.get('*', (req, res) => {
 res.sendFile(DIST_FOLDER + '/index.html');
});

// Start Express Server
app.listen(PORT, () => {
 console.log(`Node Express server listening on http://localhost:${PORT} from ${DIST_FOLDER}`);
});

You can try the live example here - you should see a bunch of cat pictures, if you are using a modern browser. To run the project from your computer, you need node.js to run the following commands:
npm install --save express rendertron-middleware node server.js
Then point your browser to http://localhost:8080. Now it’s time to set up dynamic rendering.
Deploy a Rendertron instance Rendertron runs a server that takes a URL and returns static HTML for the URL by using headless Chromium. We'll follow the recommendation from the Rendertron project and use Google Cloud Platform.
The form to create a new Google Cloud Platform project.
Please note that you can get started with the free usage tier, using this setup in production may incur costs according to the Google Cloud Platform pricing.

  1. Create a new project in the Google Cloud console. Take note of the “Project ID” below the input field.
  2. Clone the Rendertron repository from GitHub with:
    git clone https://github.com/GoogleChrome/rendertron.git 
    cd rendertron 
  3. Run the following commands to install dependencies and build Rendertron on your computer:
    npm install && npm run build
  4. Enable Rendertron’s cache by creating a new file called config.json in the rendertron directory with the following content:
    { "datastoreCache": true }
  5. Run the following command from the rendertron directory. Substitute YOUR_PROJECT_ID with your project ID from step 1.
    gcloud app deploy app.yaml --project YOUR_PROJECT_ID

  6. Select a region of your choice and confirm the deployment. Wait for it to finish.

  7. Enter the URL YOUR_PROJECT_ID.appspot.com (substitute YOUR_PROJECT_ID for your actual project ID from step 1 in your browser. You should see Rendertron’s interface with an input field and a few buttons.
Rendertron’s UI after deploying to Google Cloud Platform
When you see the Rendertron web interface, you have successfully deployed your own Rendertron instance. Take note of your project’s URL (YOUR_PROJECT_ID.appspot.com) as you will need it in the next part of the process.
Add Rendertron to the server
The web server is using express.js and Rendertron has an express.js middleware. Run the following command in the directory of the server.js file:

npm install --save rendertron-middleware

This command installs the rendertron-middleware from npm so we can add it to the server:

const express = require('express');

const app = express();

const rendertron = require('rendertron-middleware');

Configure the bot list
Rendertron uses the user-agent HTTP header to determine if a request comes from a bot or a user’s browser. It has a well-maintained list of bot user agents to compare with. By default this list does not include Googlebot, because Googlebot can execute JavaScript. To make Rendertron render Googlebot requests as well, add Googlebot to the list of user agents:

const BOTS = rendertron.botUserAgents.concat('googlebot');

const BOT_UA_PATTERN = new RegExp(BOTS.join('|'), 'i');

Rendertron compares the user-agent header against this regular expression later.
Add the middleware
To send bot requests to the Rendertron instance, we need to add the middleware to our express.js server. The middleware checks the requesting user agent and forwards requests from known bots to the Rendertron instance. Add the following code to server.js and don’t forget to substitute “YOUR_PROJECT_ID” with your Google Cloud Platform project ID:

app.use(rendertron.makeMiddleware({

 proxyUrl: 'https://YOUR_PROJECT_ID.appspot.com/render',

 userAgentPattern: BOT_UA_PATTERN
}));

Bots requesting the sample website receive the static HTML from Rendertron, so the bots don’t need to run JavaScript to display the content.
Testing our setup
To test if the Rendertron setup was successful, run the mobile-friendly test again.
Unlike the first test, the cat pictures are visible. In the HTML tab we can see all HTML the JavaScript code generated and that Rendertron has removed the need for JavaScript to display the content.
Conclusion
You created a dynamic rendering setup without making any changes to the web app. With these changes, you can serve a static HTML version of the web app to crawlers.

Over the last year, the new Search Console has been growing and growing, with the goal of making it easier for site owners to focus on the important tasks. For us, focus means being able to put in all our work into the new Search Console, being committed to the users, and with that, being able to turn off some of the older, perhaps already-improved, aspects of the old Search Console. This gives us space to further build out the new Search Console, adding and improving features over time.

Here are some of the upcoming changes in Search Console that we're planning on making towards end of March, 2019:

Crawl errors in the new Index Coverage report

One of the more common pieces of feedback we received was that the list of crawl errors in Search Console was not actionable when it came to setting priorities (it's normal that Google crawls URLs which don't exist, it's not something that needs to be fixed on the website). By changing the focus on issues and patterns used for site indexing, we believe that site owners will be able to find and fix issues much faster (and when issues are fixed, you can request reprocessing quickly too). With this, we're going to remove the old Crawl Errors report - for desktop, smartphone, and site-wide errors. We'll continue to improve the way issues are recognized and flagged, so if there's something that would help you, please submit feedback in the tools.

Along with the Crawl Errors report, we're also deprecating the crawl errors API that's based on the same internal systems. At the moment, we don't have a replacement for this API. We'll inform API users of this change directly.

Sitemaps data in Index Coverage

As we move forward with the new Search Console, we're turning the old sitemaps report off. The new sitemaps report has most of the functionality of the old report, and we're aiming to bring the rest of the information - specifically for images & video - to the new reports over time. Moreover, to track URLs submitted in sitemap files, within the Index Coverage report you can select and filter using your sitemap files. This makes it easier to focus on URLs that you care about.

Using the URL inspection tool to fetch as Google

The new URL inspection tool offers many ways to check and review URLs on your website. It provides both a look into the current indexing, as well as a live check of URLs that you've recently changed. In the meantime, this tool shows even more information on URLs, such as the HTTP headers, page resource, the JavaScript console log, and a screenshot of the page. From there, you can also submit pages for re-processing, to have them added or updated in our search results as quickly as possible.

User-management is now in settings

We've improved the user management interface and decreased clutter from the tool by merging it with the Settings section of the new Search Console. This replaces the user-management features in the old Search Console.

Structured data dashboard to dedicated reports per vertical

To help you implement Rich Results for you site, we added several reports to the new Search Console last year. These include Jobs, Recipes, Events and Q&A. We are committed to keep adding reports like these to the new Search Console. When Google encounters a syntax error parsing Structured Data on a page, it will also be reported in aggregate to make sure you don’t miss anything critical.

Other Structured Data types that are not supported with Rich Results features, will not be reported in Search Console anymore. We hope this reduces distraction from non-critical issues, and help you to focus on fixing problems which could be visible in Search.

Letting go of some old features

With the focus on features that we believe are critical to site owners, we've had to make a hard decision to drop some features in Search Console. In particular:

HTML suggestions - finding short and duplicated titles can be useful for site owners, but Google's algorithms have gotten better at showing and improving titles over the years. We still believe this is something useful for sites to look into, and there are some really good tools that help you to crawl your website to extract titles & descriptions too.

Property Sets - while they're loved by some site owners, the small number of users makes it hard to justify maintaining this feature. However, we did learn that users need a more comprehensive view of their website and so we will soon add the option of managing a search console account over an entire domain (regardless of schema type and sub-domains). Stay tuned!

Android Apps - most of the relevant functionality has been moved to the Firebase console over the years.

Blocked resources - we added this functionality to help sites with unblocking of CSS and JavaScript files for mobile-friendliness several years back. In the meantime, these issues have gotten much fewer, the usage of this tool has dropped significantly, and you're able to find blocked resources directly in the URL inspection tool.

Please send us feedback!

We realize some of these changes will affect your work-flows, so we want to let you know about them as early as possible. Please send us your feedback directly in the new Search Console, if there are aspects which are unclear, or which would ideally be different for your use-case. For more detailed feedback, please use our help forums, feel free to include screenshots & ideas. In the long run, we believe the new Search Console will make things much easier, help you focus on the issues affecting your site, and the opportunities available to your site, with regards to search.

We're looking forward to an exciting year!


With the New Year now underway, we'd like to offer some best practices and advice we hope will lead publishers to more success within Google News in 2019.

General advice

There is a lot of helpful information to consider within the Google News Publisher Help Center. Be sure to have read the material in this area, in particular the content and technical guidelines.

Headlines and dates


  • Present clear headlines: Google News looks at a variety of signals to determine the headline of an article, including within your HTML title tag and for the most prominent text on the page. Review our headline tips.
  • Provide accurate times and dates: Google News tries to determine the time and date to display for an article in a variety of ways. You can help ensure we get it right by using the following methods:
    • Show one clear date and time: As per our date guidelines, show a clear, visible date and time between the headline and the article text. Prevent other dates from appearing on the page whenever possible, such as for related stories.
    • Use structured data: Use the datePublished and dateModified schema and use the correct time zone designator for AMP or non-AMP pages
  • Avoid artificially freshening stories: If an article has been substantially changed, it can make sense to give it a fresh date and time. However, don't artificially freshen a story without adding significant information or some other compelling reason for the freshening. Also, do not create a very slightly updated story from one previously published, then delete the old story and redirect to the new one. That's against our article URLs guidelines.

Duplicate content

Google News seeks to reward independent, original journalistic content by giving credit to the originating publisher, as both users and publishers would prefer. This means we try not to allow duplicate content—which includes scraped, rewritten, or republished material—to perform better than the original content. In line with this, these are guidelines publishers should follow:

  • Block scraped content: Scraping commonly refers to taking material from another site, often on an automated basis. Sites that scrape content must block scraped content from Google News.
  • Block rewritten content: Rewriting refers to taking material from another site, then rewriting that material so that it is not identical. Sites that rewrite content in a way that provides no substantial or clear added value must block that rewritten content from Google News. This includes, but is not limited to, rewrites that make only very slight changes or those that make many word replacements but still keep the original article's overall meaning.
  • Block or consider canonical for republished content: Republishing refers to when a publisher has permission from another publisher or author to republish an original work, such as material from wire services or in partnership with other publications.
    Publishers that allow others to republish content can help ensure that their original versions perform better in Google News by asking those republishing to block or make use of canonical.
    Google News also encourages those that republish material to consider proactively blocking such content or making use of the canonical, so that we can better identify the original content and credit it appropriately.
  • Avoid duplicate content: If you operate a network of news sites that share content, the advice above about republishing is applicable to your network. Select what you consider to be the original article and consider blocking duplicates or making use of the canonical to point to the original.

Transparency


  • Be transparent: Visitors to your site want to trust and understand who publishes it and information about those who have written articles. That's why our content guidelines stress that content should have posts with clear bylines, information about authors, and contact information for the publication.
  • Don't be deceptive: Our content policies do not allow sites or accounts that impersonate any person or organization, or that misrepresent or conceal their ownership or primary purpose. We do not allow sites or accounts that engage in coordinated activity to mislead users. This includes, but isn't limited to, sites or accounts that misrepresent or conceal their country of origin or that direct content at users in another country under false premises.

More tips


  • Avoid taking part in link schemes: Don't participate in link schemes, which can include large-scale article marketing programs or selling links that pass PageRank. Review our page on link schemes for more information.
  • Use structured data for rich presentation: Both those using AMP and non-AMP pages can make use of structured data to optimize your content for rich results or carousel-like presentations.
  • Protect your users and their data: Consider securing every page of your website with HTTPS to protect the integrity and confidentiality of the data users exchange on your site. You can find more useful tips in our best practices on how to implement HTTPS.

Here's to a great 2019!

We hope these tips help publishers succeed in Google News over the coming year. For those who have more questions about Google News, we are unable to do one-to-one support. However, we do monitor our Google News Publisher Forum—which has been newly-revamped—and try to provide guidance on questions that might help a number of publishers all at once. The forum is also a great resource where publishers share tips and advice with each other.

For every train there's a passenger, but it turns out comments are not our train.

Over the years we read thousands of comments we've received on our blog posts on the Google Webmaster Central blog. Sometimes they were extremely thoughtful, other times they made us laugh out loud, but most of the time they were off-topic or even outright spammy; if you think about it, the latter is rather ironic, considering this is the Google Webmaster Blog.

Effective today, we're closing the commenting feature on the Google Webmaster Central blog. Instead of reading the comments here on the blog, we're going to focus on interacting with the community on our other channels. For all of our subsequent posts, if you have comments, feedback, or funny stories, you can find us in our help forums or on Twitter.

Posted by Gary, House elf