Google Cloud Platform Blog: Solutions

New GitHub repo: Using Firebase to add cloud-based features to games built on Unity

Friday, June 29, 2018

By Dane Zeke Liergaard

A while back, a group of us Google Cloud Platform Developer Programs Engineers teamed up with gaming fans in Firebase Engineering to work on an interesting project. We all love games, gamers, and game developers, and we wanted to support those developers with solutions that accomplish common tasks so they can focus more on what they do best: making great games.

The result was Firebase Unity Solutions. It’s an open-source github repository with sample projects and scripts. These projects utilize Firebase tools and services to help you add cloud-based features to your games being built on Unity.

Each feature will include all the required scripts, a demo scene, any custom editors to help you better understand and use the provided assets, and a tutorial to use as a step-by-step guide for incorporating the feature into your game.

The only requirements are a Unity project with the .NET 2.0 API level enabled, and a project created with the Firebase Console.

Introducing Firebase Leaderboard

Our debut project is the Firebase_Leaderboard, a set of scripts that utilize Firebase Realtime Database to create and manage a cross-platform high score leaderboard. With the LeaderboardController MonoBehaviour, you can retrieve any number of unique users’ top scores from any time frame. Want the top 5 scores from the last 24 hours? Done. How about the top 100 from last week? You got it.

Once a connection to Firebase is established, scores are retrieved automatically, including any new scores that come in while the controller is enabled.

If any of those parameters are modified (the number of scores to retrieve, or the start or end date), the scores are automatically refreshed. The content is always up-to-date!

private void Start() {
    this.leaderboard = FindObjectOfType();
    leaderboard.FirebaseInitialized += OnInitialized;
    leaderboard.TopScoresUpdated += UpdateScoreDisplay;
    leaderboard.UserScoreUpdated += UpdateUserScoreDisplay;
    leaderboard.ScoreAdded += ScoreAdded;

    MessageText.text = "Connecting to Leaderboard...";
}

With the same component, you can add new scores for current users as well, meaning a single script handles both read and write operations on the top score data.

public void AddScore(string userId, int score) {
    leaderboard.AddScore(userId, score);
}

For step-by-step instructions on incorporating this cross-platform leaderboard into your Unity game using Firebase Realtime Database, follow the instructions here. Or check out the Demo Scene to see a version of the leaderboard in action!

We want to hear from you

We have ideas for what features to add to this repository moving forward, but we want to hear from you, too! What game feature would you love to see implemented in Unity using Firebase tools? What cloud-based functionality would you like to be able to drop directly into your game? And how can we improve the Leaderboard, or other solutions as they are added? You can comment below, create feature requests and file bugs on the github repo, or join the discussion in this Google Group.

Let’s make great games together!

Three steps to prepare your users for cloud data migration

Monday, May 14, 2018

By Paul Williams, Strategic Cloud Engineer

When preparing to migrate a legacy system to a cloud-based data analytics solution, as engineers we often focus on the technical benefits: Queries will run faster, more data can be processed and storage no longer has limits. For IT teams, these are significant, positive developments for the business. End users, though, may not immediately see the benefits of this technology (and internal culture) change. For your end users, running macros in their spreadsheet software of choice or expecting a query to return data in a matter of days (and planning their calendar around this) is the absolute norm. These users, more often than not, don’t see the technology stack changes as a benefit. Instead, they become a hindrance. They now need to learn new tools, change their workflows and adapt to the new world of having their data stored more than a few milliseconds away—and that can seem like a lot to ask from their perspective.

It’s important that you remember these users at all stages of a migration to cloud services. I’ve worked with many companies moving to the cloud, and I’ve seen how easy it is to forget the end users during a cloud migration, until you get a deluge of support tickets letting you know that their tried-and-tested methods of analyzing data no longer work. These added tickets increase operational overhead on the support and information technology departments, and decrease the number of hours that can be spent on doing the useful, transformative work—that is, analyzing the wealth of data that you now have available. Instead, you can end up wasting time trying to mold these old, inconvenient processes to fit this new cloud world, because you don’t have the time to transform into a cloud-first approach.

There are a few essential steps you can take to successfully move your enterprise users to this cloud-first approach.

1. Understand the scope

There are a few questions you should ask your team and any other teams inside your organization that will handle any stored or accessed data.

Where is the data coming from?
How much data do we process?
What tools do we use to consume and analyse the data?
What happens to the output that we collect?

When you understand these fundamentals during the initial scoping of a potential data migration, you’ll understand the true impact that such a project will have on those users consuming the affected data. It’s rarely as simple as “just point your tool at the new location.” A cloud migration could massively increase expected bandwidth costs if the tools aren’t well-tuned for a cloud-based approach—for example, by downloading the entire data set before analyzing the required subset.

To avoid issues like this, conduct interviews with the teams that consume the data. Seek to understand how they use and manipulate the data they have access to, and how they gain access to that data in the first place. This will all need to be replicated in the new cloud-based approach, and it likely won’t map directly. Consider using IAM unobtrusively to grant teams access to the data they need today. That sets you up to expand this scope easily and painlessly in the future. Understand the tools in use today, and reach out to vendors to clarify any points.. Don’t assume a tool does something if you don’t have documentation and evidence. It might look like the tool just queries the small section of data it requires, but you can’t know what’s going on behind the scenes unless you wrote it yourself!

Once you’ve gathered this information, develop clear guidelines for what new data analytics tooling should be used after a cloud migration, and whether it is intended as a substitute or a complement to the existing tooling. It is important to be opinionated here. Your users will be looking to you for guidance and support with new tooling. Since you’ll have spoken to them extensively beforehand, you’ll understand their use cases and can make informed, practical recommendations for tooling. This also allows you to scope training requirements. You can’t expect users to just pick up new tools and be as productive as they had been right away. Get users trained and comfortable with new tools before the migration happens.

2. Establish champions

Teams or individuals will sometimes stand against technology change. This can be for a variety of reasons, including worries over job security, comfort with existing methods or misunderstanding of the goals of the project. By finding and utilizing champions within each team, you’ll solve a number of problems:

Training challenges. Mass training is impersonal and can’t be tailored per team. Champions can deliver custom training that will hit home with their team.
Transition difficulties. Individual struggles by team can be hard to track and manage. By giving each team a voice through their champion, users will feel more involved in the project, and their issues are more likely to be addressed, reducing friction in the final stages.
Overloaded support teams. Champions become the voice of the project within the team too. This can have the effect of reducing support workload in the days, weeks and months during and after a migration, since the champion can be the first port of call when things aren’t running quite as expected.

Don’t underestimate the power of having people represent the project on their own teams, rather than someone outside to the team proposing change to an established workflow. The former is much more likely to be favorably received.

3. Promote the cloud transformation

It is more than likely that the current methods of data ingestion and analysis, and possibly the methods of data output and storage, will be suboptimal, or worse impossible, under the new cloud model. It is important that teams are suitably prepared for these changes. To make the transition easier, consider taking these approaches to informing users and allowing them room to experiment.

Promote and develop the understanding of having the power of the cloud behind the data. It’s an opportunity to ask questions of data that might otherwise have been locked away before, whether behind time constraints, or incompatibility with software, or even a lack of awareness that the data was even available to query. By combining data sets, can you and your teams become more evidential, and get better results that answer deeper, more important questions? Invariably, the answer is yes.
In the case that an existing tool will continue to be used, it will be invaluable to provide teams with new data locations and instructions for reconfiguring applications. It is important that this is communicated, whether or not the change will be apparent to the user. Undoubtedly, some custom configuration somewhere will break, but you can reduce the frustration of an interruption by having the right information available.
By having teams develop and build new tooling early, rather than during or after migration, you’ll give them the ability to play with, learn and develop the new tools that will be required. This can be on a static subset of data pulled from the existing setup, creating a sandbox where users can analyze and manipulate familiar data with new tools. That way, you’ll help drive driving the adoption of new tools early and build some excitement around them. (Your champions are a good resource for this.)

Throughout the process of moving to cloud, remember the benefits that shouldn’t be understated. No longer do your analyses need to take days. Instead, the answers can be there when you need them. This frees up analysts to create meaningful, useful data, rather than churning out the same reports over and over. It allows consumers of the data to access information more freely, without needing the help of a data analyst, by exposing dashboards and tools. But these high-level messages need to be supplemented with the personal needs of the team—show them the opportunities that exist and get them excited! It’ll help these big technological changes work for the people using the technology every day.

Defining SLOs for services with dependencies - CRE life lessons

Monday, May 7, 2018

By Robert van Gent, Customer Reliability Engineer and Cody Smith, Site Reliability Engineer

In a previous episode of CRE Life Lessons, we discussed how service level objectives (SLOs) are an important tool for defining and measuring the reliability of your service. There’s also a whole chapter in the SRE book about this topic. In this episode, we discuss how to define and manage SLOs for services with dependencies, each of which may (or may not!) have their own SLOs.

Any non-trivial service has dependencies. Some dependencies are direct: service A makes a Remote Procedure Call to service B, so A depends on B. Others are indirect: if B in turn depends on C and D, then A also depends on C and D, in addition to B. Still others are structurally implicit: a service may run in a particular Google Cloud Platform (GCP) zone or region, or depend on DNS or some other form of service discovery.

To make things more complicated, not all dependencies have the same impact. Outages for "hard" dependencies imply that your service is out as well. Outages for "soft" dependencies should have no impact on your service if they were designed appropriately. A common example is best-effort logging/tracing to an external monitoring system. Other dependencies are somewhere in between; for example, a failure in a caching layer might result in degraded latency performance, which may or may not be out of SLO.

Take a moment to think about one of your services. Do you have a list of its dependencies, and what impact they have? Do the dependencies have SLOs that cover your specific needs?

Given all this, how can you as a service owner define SLOs and be confident about meeting them? Consider the following complexities:

Some of your dependencies may not even have SLOs, or their SLOs may not capture how you're using them.
The effect of a dependency's SLO on your service isn't always straightforward. In addition to the "hard" vs "soft" vs "degraded" impact discussed above, your code may complicate the effect of a dependency's SLOs on your service. For example, you have a 10s timeout on an RPC, but its SLO is based on serving a response within 30s. Or, your code does retries, and its impact on your service depends on the effectiveness of those retries (e.g., if the dependency fails 0.1% of all requests, does your retry have a 0.1% chance of failing or is there something about your request that means it is more than 0.1% likely to fail again?).
How to combine SLOs of multiple dependencies depends on the correlation between them. At the extremes, if all of your dependencies are always unavailable at the same time, then theoretically your unavailability is based on the max(), i.e., the dependency with the longest unavailability. If they are unavailable at distinct times, then theoretically your unavailability is the sum() of the unavailability of each dependency. The reality is likely somewhere in between.
Services usually do better than their SLOs (and usually much better than their service level agreements), so using them to estimate your downtime is often too conservative.

At this point you may want to throw up your hands and give up on determining an achievable SLO for your service entirely. Don't despair! The way out of this thorny mess is to go back to the basics of how to define a good SLO. Instead of determining your SLO bottom-up ("What can my service achieve based on all of my dependencies?"), go top down: "What SLO do my customers need to be happy?" Use that as your SLO.

Risky business

You may find that you can consistently meet that SLO with the availability you get from your dependencies (minus your own home-grown sources of unavailability). Great! Your users are happy. If not, you have some work to do. Either way, the top-down approach of setting your SLO doesn't mean you should ignore the risks that dependencies pose to it. CRE tech lead Matt Brown gave a great talk at SRECon18 Americas about prioritizing risk (slides), including a risk analysis spreadsheet that you can use to help identify, communicate, and prioritize the top risks to your error budget (the talk expands on a previous CRE Life Lessons blog post).

Some of the main sources of risk to your SLO will of course come from your dependencies. When modeling the risk from a dependency, you can use its published SLO, or choose to use observed/historical performance instead: SLOs tend to be conservative, so using them will likely overestimate the actual risk. In some cases, if a dependency doesn't have a published SLO and you don't have historical data, you'll have to use your best guess. When modeling risk, also keep in mind the difficulties described above about mapping a dependency's SLO onto yours. If you're using the spreadsheet, you can try out different values (for example, the published SLO for a dependency versus the observed performance) and see the effect they have on your projected SLO performance.¹

Remember that you're making these estimates as a tool for prioritization; they don't have to be perfectly accurate, and your estimates won't result in any guarantees. However, the process should give you a better understanding of whether you're likely to consistently meet your SLO, and if not, what the biggest sources of risk to your error budget are. It also encourages you to document your assumptions, where they can be discussed and critiqued. From there, you can do a pragmatic cost/benefit analysis to decide which risks to mitigate.

For dependencies, mitigation might mean:

Trying to remove it from your critical path
Making it more reliable; e.g., running multiple copies and failing over between them
Automating manual failover processes
Replacing it with a more reliable alternative
Sharding it so that the scope of failure is reduced
Adding retries
Increasing (or decreasing, sometimes it is better to fail fast and retry!) RPC timeouts
Adding caching and using stale data instead of live data
Adding graceful degradation using partial responses
Asking for an SLO that better meets your needs

There may be very little you can do to mitigate unavailability from a critical infrastructure dependency, or it might be prohibitively expensive. Instead, mitigate other sources of error budget burn, freeing up error budget so you can absorb outages from the dependency.

A series of earlier CRE Life Lessons posts (1, 2, 3) discussed consequences and escalations for SLO violations, as a way to balance velocity and risk; an example of a consequence might be to temporarily block new releases when the error budget is spent. If an outage was caused by one of your service's dependencies, should the consequences still apply? After all, it's not your fault, right?!? The answer is "yes"—the SLO is your proxy for your users' happiness, and users don't care whose "fault" it is. If a particular dependency causes frequent violations to your SLO, you need to mitigate the risk from it, or mitigate other risks to free up more error budget. As always, you can be pragmatic about how and when to enforce consequences for SLO violations, but if you're regularly making exceptions, especially for the same cause, that's a sign that you should consider lowering your SLOs, or increasing the time/effort you are putting into improving reliability.

In summary, every non-trivial service has dependencies, probably many of them. When choosing an SLO for your service, don't think about your dependencies and what SLO you can achieve—instead, think about your users, and what level of service they need to be happy. Once you have an SLO, your dependencies represent sources of risk, but they're not the only sources. Analyze all of the sources of risk together to predict whether you'll be able to consistently meet your SLO and prioritize which risks to mitigate.

¹ If you're interested, The Calculus of Service Availability has more in-depth discussion about modeling risks from dependencies, and strategies for mitigating them.

Scale big while staying small with serverless on GCP — the Guesswork.co story

Tuesday, May 1, 2018

By Mani Doraisamy, founder at Guesswork.co and Google Developer Expert

[Editor’s note: Mani Doraisamy built two products—Guesswork.co and CommerceDNA—on top of Google Cloud Platform. In this blog post he shares insights into how his application architecture evolved to support the changing needs of his growing customer base while still staying cost-effective.]

Guesswork is a machine learning startup that helps e-commerce companies in emerging markets recommend products for first-time buyers on their site. Large and established e-commerce companies can analyze their users' past purchase history to predict what product they are most likely to buy next and make personalized recommendations. But in developing countries, where e-commerce companies are mostly focused on attracting new users, there’s no history to work from, so most recommendation engines don’t work for them. Here at Guesswork, we can understand users and recommend them relevant products even if we don’t have any prior history about them. To do that, we analyze lots of data points about where a new user is coming from (e.g., did they come from an email campaign for t-shirts, or a fashion blog about shoes?) to find every possible indicator of intent. Thus far, we’ve worked with large e-commerce companies around the world such as Zalora (Southeast Asia), Galeries Lafayette Group (France) and Daraz (South Asia).

Building a scalable system to support this workload is no small feat. In addition to being able to process high data volumes per each customer, we also need to process hundreds of millions of users every month, plus any traffic spikes that happen during peak shopping seasons.

As a bootstrapped startup, we had three key goals while designing the system:

Stay small. As a small team of three developers, we didn’t want to add any additional personnel even if we needed to scale up for a huge volume of users.
Stay profitable. Our revenue is based on the performance of our recommendation engine. Instead of a recurring fee, customers pay us a commission on sales to their users that come from our recommendations. This business model made our application architecture and infrastructure costs a key factor in our ability to turn a profit.
Embrace constraints. In order to increase our development velocity and stay flexible, we decided to trade off control over our development stack and embrace constraints imposed by managed cloud services.

These three goals turned into our motto: "I would rather optimize my code than fundraise." By turning our business goals into a coding problem, we also had so much more fun. I hope you will too, as I recount how we did it.

Choosing a database: The Three Musketeers

The first stack we focused was the database layer. Since we wanted to build on top of managed services, we decided to go with Google Cloud Platform (GCP)—a best-in-class option when it comes to scaling, in our opinion.

But, unlike traditional databases, cloud databases are not general purpose. They are specialized. So we picked three separate databases for transactional, analytical and machine learning workloads. We chose:

Cloud Datastore for our transactional database, because it can support high number of writes. In our case, the user events are in the billions and are updated in real time into Cloud Datastore.
BigQuery to analyze user behaviour. For example, we understand from BigQuery that users coming from a fashion blog usually buy a specific type of formal shoes.
Vision API to analyze product images and categorize products. Since we work with e-commerce companies across different geographies, the product names and descriptions are in different languages, and categorizing products based on images is more efficient than text analysis. We use this data along with user behaviour data from BigQuery and Cloud Datastore to make product recommendations.

First take: the App Engine approach

Once we chose our databases, we moved on to selecting the front-end service to receive user events from e-commerce sites and update Cloud Datastore. We chose App Engine, since it is a managed service and scales well at our volumes. Once App Engine updates the user events in Cloud Datastore, we synchronized that data into BigQuery and our recommendation engine using Cloud Dataflow, another managed service that orchestrates different databases in real time (i.e., streaming mode).

This architecture powered the first version of our product. As our business grew, our customers started asking for new features. One feature request was to send alerts to users when the price of a product changed. So, in the second version, we began listening to price changes in our e-commerce sites and triggered events to send alerts. The product’s price is already recorded as a user event in Cloud Datastore, but to detect change:

We compare the price we receive in the user event with the product master and determine if there is a difference.
If there is a difference, we propagate it to the analytical and machine learning databases to trigger an alert and reflect that change in the product recommendation.

There are millions of user events every day. Comparing each user event data with product master increased the number of reads on our datastore dramatically. Since each Cloud Datastore read counts toward our GCP monthly bill, it increased our costs to an unsustainable level.

Take two: the Cloud Functions approach

To bring down our costs, we had two options for redesigning our system:

Use memcache to load the product master in memory and compare the price/stock for every user event. With this option, we had no guarantee that memcache would be able to hold so many products in memory. So, we might miss a price change and end up with inaccurate product prices.
Use Cloud Firestore to record user events and product data. Firestore has an option to trigger Cloud Functions whenever there’s a change in value of an entity. In our case, the price/stock change automatically triggers a cloud function that updates the analytical and machine learning databases.

During our redesign, Firestore and Cloud Functions were in alpha, but we decided to use them as it gave us a clean and simple architecture:

With Firestore, we replaced both App Engine and Datastore. Firestore was able to accept user requests directly from a browser without the need for a front-end service like App Engine. It also scaled well like Datastore.
We used Cloud Functions not only as a way to trigger price/stock alerts, but as an orchestration tool to synchronize data between Firestore, BigQuery and our recommendation engine.

It turned out to be a good decision, as Cloud Functions scaled extremely well, even in alpha. For example, we went from one to 20 million users on Black Friday. In this new architecture, Cloud Functions replaced Dataflow’s streaming functionality with triggers, while providing a more intuitive language (JavaScript) than Dataflow’s pipeline transformations. Eventually, Cloud Functions became the glue that tied all the components together.

What we gained

Thanks to the flexibility of our serverless microservice-oriented architecture, we were able to replace and upgrade components as the needs of our business evolved without redesigning the whole system. We achieved the key goal of being profitable by using the right set of managed services and keeping our infrastructure costs well below our revenue. And since we didn't have to manage any servers, we were also able to scale our business with a small engineering team and still sleep peacefully at night.

Additionally, we saw some great outcomes that we didn't initially anticipate:

We increased our sales commissions by improving recommendation accuracy

The best thing that happened in this new version was the ability to A/B test new algorithms. For example, we found that users who browse e-commerce sites with an Android phone are more likely to buy products that are on sale. So, we included user’s device as a feature in the recommendation algorithm and tested it with a small sample set. Since, Cloud Functions are loosely coupled (with Cloud Pub/Sub), we could implement a new algorithm and redirect users based on their device and geography. Once the algorithm produced good results, we rolled it out to all users without taking down the system. With this approach, we were able to continuously improve the accuracy of our recommendations, increasing revenue.

We reduced costs by optimizing our algorithm

As counter intuitive it may sound, we also found that paying more money for compute didn't improve accuracy. For example, we analyzed a month of a user’s events vs. the latest session’s events to predict what the user was likely to buy next. We found that the latest session was more accurate even though it had less data points. The simpler and more intuitive the algorithm, the better it performed. Since Cloud Functions are modular by design, we were able to refactor each module and reduce costs without losing accuracy.

We reduced our dependence on external IT teams and signed more customers

We work with large companies and depending on their IT team, it can take a long time to integrate our solution. Cloud Functions allowed us to implement configurable modules for each of our customers. For example, while working with French e-commerce companies, we had to translate the product details we receive in the user events into English. Since Cloud Functions supports Node.js, we enabled scriptable modules in JavaScript for each customer that allowed us to implement translation on our end, instead of waiting for the customer’s IT team. This reduced our go-live time from months to days, and we were able to sign up new customers who otherwise might not have been able to invest the necessary time and effort up-front.

Since Cloud Functions was alpha at the time, we did face challenges while implementing non-standard functionality such as running headless Chrome. In such cases, we fell back on App Engine flexible environment and Compute Engine. Over time though, the Cloud Functions product team moved most of our desired functionality back into the managed environment, simplifying maintenance and giving us more time to work on functionality.

Let a thousand flowers bloom

If there is one take away from this story, it is this: Running a bootstrapped startup that serves 100 million users with three developers was unheard of just five years ago. With the relentless pursuit of abstraction among cloud platforms, this has become a reality. Serverless computing is at the bleeding edge of this abstraction. Among the serverless computing products, I believe Cloud Functions has a leg up on its competition because it stands on the shoulders of GCP's data products and their near-infinite scale. By combining simplicity with scale, Cloud Functions is the glue that makes GCP greater than the sum of its parts.The day has come when a bootstrapped startup can build a large-scale application like Gmail or Salesforce. You just read one such story— now it’s your turn :)

Building reliable deployments with Spinnaker, Container Engine and Container Builder

Monday, October 9, 2017

By Vic Iglesias, Cloud Solutions Architect

Kubernetes has some amazing primitives to help you deploy your applications, which let Kubernetes handle the heavy lifting of rolling out containerized applications. With Container Engine you can have your Kubernetes clusters up in minutes ready for your applications to land on them.

But despite the ease of standing up this fine-tuned deployment engine, there are many things that need to happen before deployments can even start. And once they’ve kicked off, you’ll want to make sure that your deployments have completed safely and in a timely manner.

To fill these gaps, developers often look to tools like Container Builder and Spinnaker to create continuous delivery pipelines.

We recently created a solutions guide that shows you how to build out a continuous delivery pipeline from scratch using Container Builder and Spinnaker. Below is an example continuous delivery pipeline that validates your software, builds it, and then carefully rolls it out to your users:

First, your developers tag your software and push it to a Git repository. When the tagged commit lands in your repository, Container Builder detects the change and begins the process of building and testing your application. Once your tests have passed, an immutable Docker image of your application is tagged and pushed to Container Registry. Spinnaker picks it up from here by detecting that a new Docker image has been pushed to your registry and starting the deployment process.

Spinnaker’s pipeline stages allow you to create complex flows to roll out changes. The example here uses a canary deployment to roll out the software to a small percentage of users, and then runs a functional validation of your application. Once those functional checks are complete in the canary environment, Spinnaker pauses the deployment pipeline and waits for a manual approval before it rolls out the application to the rest of your users. Before approving it, you may want to inspect some key performance indicators, wait for traffic in your application to settle or manually validate the canary environment. Once you’re satisfied with the changes, you can approve the release and Spinnaker completes rolling out your software.

As you can imagine, this exact flow won’t work for everyone. Thankfully Spinnaker and Container Builder give you flexible and granular stages that allow you to automate your release process while mapping it to the needs of your organization.

Get started by checking out the Spinnaker solution. Or visit the documentation to learn more about Spinnaker’s pipeline stages.

Customizing Stackdriver Logs for Container Engine with Fluentd

Monday, October 9, 2017

By John La Barge, Cloud Solutions Architect

Many Google Cloud Platform (GCP) users are now migrating production workloads to Container Engine, our managed Kubernetes environment. Container Engine supports Stackdriver logging on GCP by default, which uses Fluentd under the hood to send your logs to Stackdriver.

You may also want to fully customize your Container Engine cluster’s Stackdriver logs with additional logging filters. If that describes you, check out this tutorial where you’ll learn how you can configure Fluentd in Container Engine to apply additional logging filters prior to sending your logs to Stackdriver.

Using Stackdriver Logging for dedicated game server instances: new tutorial

Friday, October 6, 2017

Joseph Holley, Cloud Solutions Architect

Capturing logs from dedicated game server instances in a central location can be useful for troubleshooting, keeping track of instance runtimes and machine load, and capturing historical data that occurs during the lifetime of a game.

But collecting and making sense of these logs can be tricky, especially if you are launching the same game in multiple regions, or have limited resources on which to collect the logs themselves.

One possible solution to these problems is to collect your logs in the cloud. Doing this enables you to mine your data with tools that deliver speed and power not possible from an on-premise logging server. Storage and data management is simple in the cloud and not bound by physical hardware. Additionally, you can access cloud logging resources globally. Studios and BI departments across the globe can access the same logging database regardless of physical location, making collaboration for distributed teams significantly easier.

We recently put together a tutorial that shows you how to integrate Stackdriver Logging, our hosted log management and analysis service for data running on Google Cloud Platform (GCP) and AWS, into your own dedicated game server environment. It also offers some key storage strategies, including how to migrate this data to BigQuery and other Google Cloud tools. Check it out, and let us know what other Google Cloud tools you’d like to learn how to use in your game operations. You can reach me on Twitter at @gcpjoe.

Solutions guide: Preparing Container Engine environments for production

Thursday, June 8, 2017

By Vic Iglesias, Cloud Solutions Architect

Many Google Cloud Platform (GCP) users are now migrating production workloads to Container Engine, our managed Kubernetes environment. You can spin up a Container Engine cluster for development, then quickly start porting your applications. First and foremost, a production application must be resilient and fault tolerant and deployed using Kubernetes best practices. You also need to prepare the Kubernetes environment for production by hardening it. As part of the migration to production, you may need to lock down who or what has access to your clusters and applications, both from an administrative as well as network perspective.

We recently created a guide that will help you with the push towards production on Container Engine. The guide walks through various patterns and features that allow you to lock down your Container Engine workloads. The first half focuses on how to control access to the cluster administratively using IAM and Kubernetes RBAC. The second half dives into network access patterns teaching you to properly configure your environment and Kubernetes services. With the IAM and networking models locked down appropriately, you can rest assured that you're ready to start directing your users to your new applications.

Read the full solution guide for using Container Engine for production workloads, or learn more about Container Engine from the documentation.

Evaluating Cloud SQL Second Generation for your mobile game

Monday, September 26, 2016

Posted by Joseph Holley, Gaming Solutions Architect, Google Cloud Platform

Many of today's most successful games are played in small sessions on the devices in our pockets. Players expect to open the game app from any of their supported devices and find themselves right where they left off. In addition, players may be very sensitive to delays caused by waiting for the game to save their progress during play. For mobile game developers, all of this adds up to the need for a persistent data store that can be accessed with consistently low latency.

Game developers with database experience are usually most comfortable with relational databases as their backend game state storage. MySQL, with its ACID-compliant transactions and well-understood semantics offers a known pattern. However, "game developer" and "database administrator" are different titles for a reason; game developers may not relish standing up and administering a database when they could be building new game content and features. That’s why Google Cloud Platform offers high-performance, fully-managed MySQL instances in the form of Google Cloud SQL Second Generation to help handle your mobile game's persistent storage.

Many game developers ask for guidance about how much player load (concurrent users in a game) Cloud SQL can handle. In order to provide a starting point for these discussions, we recently published a new solutions document that details a simple mock game stress-testing framework built on Google Cloud Platform and Cloud SQL Second Generation. For a data model, we looked to the data schema and access patterns of popular massively single-player social games such as Puzzle and Dragons™ or Monster Strike™ for our testing framework. We also made the source code for the framework available so you can have a look at whether the simulated gameplay patterns and the data model are similar to your game’s. The results should provide a starting point for deciding if Cloud SQL Second Generation's performance is the right fit for your next game project's concurrent user estimates.

For more information about Cloud SQL Second Generation, have a look at the documentation. If you'd like to see more solutions, check out the gaming solutions page.

Web serving on Google Cloud Platform: an overview

Tuesday, September 6, 2016

Posted by Jim Travis, Senior Lead Technical Writer

If you're running a website and considering moving your web serving infrastructure to the cloud, Google Cloud Platform offers a variety of great options. But with so many products and services, it can be hard to figure out what's right for your particular needs. To help you understand the landscape of web hosting options, we recently published a new overview, our Serving Websites guide.

This guide starts with the idea that you're probably already running a site and/or understand a particular set of technologies, such as using a LAMP stack or hosting static pages. The guide tries to meet you where you're at to show you how your current infrastructure and knowledge can map to GCP computing and hosting products, and then links off to relevant documentation, solutions and tutorials that go deeper into the details.

The guide covers the following four main options:

Option	Product	Summary
Static website	Google Cloud Storage	Deliver static web pages and assets from a Cloud Storage bucket. This is the simplest option on GCP, and you get automatic scaling with no additional effort.
Virtual machines	Google Compute Engine	Install, configure, and maintain your own web hosting stack. You have control of every component, but you also have all the responsibility to keep things running. You also must decide how to provide for load balancing and scalability, from a variety of options.
Containers	Google Container Engine	Use container technology to package your dependencies with your code for easier deployment. Then, use Container Engine to manage clusters of your containers.
Managed platform	Google App Engine	Focus on your code, deploy to App Engine, and let Google manage the systems for you. You have a choice of the standard environment, which prescribes the languages and runtimes that you can use, and the flexible environment, that gives you additional options but requires some self-management.

For each option, the guide provides information about things like scalability, load balancing, DevOps, logging and monitoring.

We hope you find this article useful and it makes learning about GCP enjoyable. Please tell us what you think, and be sure to sign up for a free trial!

Updated and expanded: Google Cloud Platform for AWS Professionals

Friday, July 29, 2016

Posted by Peter-Mark Verwoerd, Solutions Architect

Last year, we published a guide for our customers who had familiarity and expertise with AWS but wanted to learn how it compares to Google Cloud Platform. The guide had a really positive reception, helping customers understand things like how Cloud Platform delivers Infrastructure as a Service with Google Compute Engine and how our VPN works.

Today, we're happy to announce a major expansion to the Cloud Platform for AWS Professionals guide, with new sections covering Big Data services, Storage services and Containers as a Service (Google Container Engine).

Amazon ECS vs. Google Container Engine at a glance

How Amazon Elastic MapReduce compares to Google Cloud Dataproc and Cloud Dataflow

As we said last year, this guide is a work-in-progress. We have some ideas about what topics we’d like to tackle next (services like Databases and Development tools) but we’d also love to hear what you think we should cover.

We hope you find this information useful and makes learning about Cloud Platform enjoyable. Please tell us what you think, and be sure to sign up for a free trial!

A better way to bootstrap MongoDB on Google Cloud Platform

Tuesday, June 21, 2016

Posted by Sandeep Parikh, Cloud Solutions Architect

We like to think that Google Cloud Platform is one of the best places to run high-performance, highly-available database deployments and MongoDB is no exception. In particular, with an array of standard and customizable machine types, blazing fast persistent disks and a high performance global network, Google Compute Engine is a great option for MongoDB deployments, which can then be combined with managed big data services like Google BigQuery, Cloud Dataproc and Cloud Dataflow to support all manner of modern data workloads.

There are a number of ways to deploy MongoDB on Cloud Platform, including (but not limited to):

Creating Compute Engine instances and manually installing/configuring MongoDB
Using Google Cloud Launcher to quickly create and test drive a MongoDB replica set
Provisioning Compute Engine instances and using MongoDB Cloud Manager to install, configure and manage MongoDB deployments

Today we’re taking things one step further and introducing updated documentation and Cloud Deployment Manager templates to bootstrap MongoDB deployments using MongoDB Cloud Manager. Using the templates, you can quickly deploy multiple Compute Engine instances, each with an attached persistent SSD, that will download and install the MongoDB Cloud Manager agent on startup. Once the setup process is complete, you can head over to MongoDB Cloud Manager and deploy, upgrade and manage your cluster easily from a single interface.

By default, the Deployment Manager templates are set to launch three Compute Engine instances for a replica set, but they could just as easily be updated to launch more instances if you’re interested in deploying a sharded cluster.

Check out the documentation and sample templates to get started deploying MongoDB on Cloud Platform. Feedback is welcome and appreciated; comment here, submit a pull request, create an issue or find me on Twitter @crcsmnky and let me know how I can help.

Build your own scalable, location analysis platform with Google Cloud Platform and Maps APIs

Monday, June 20, 2016

Posted by Ed Boiling, Solution Architect, Google Maps for Work and Grace Mollison, Solutions Architect, Google Cloud Platform

When our customers work with telemetry data from large fleets of vehicles or big deployments of sensors that move about in the world, they typically have to combine multiple Google APIs to capture, process, store, analyze and visualize their data at scale. We recently built a scalable geolocation telemetry system with just Google Cloud Pub/Sub, Google BigQuery and the Google Maps APIs. The solution comes with a full tutorial, Docker images you can use straight away and some sample data to test it with.

The sample solution retrieves data using BigQuery and renders it as a heat map indicating density.

We chose Google Cloud Pub/Sub to handle the incoming messages from vehicle or device sensors as it is a serverless system that scales to handle many thousands of messages at once with minimal configuration. Just create a topic and start adding messages to it.

Google BigQuery offers petabyte scale, serverless data warehousing and analytics — ideal for large fleets of vehicles that will send thousands of messages a second, year after year. Further, BigQuery can perform simple spatial queries to select by location or do geofencing on vast datasets — all in a few seconds.

The Google Maps APIs add an extra dimension to telemetry data by converting raw GPS position into human-readable structured address data, as well as adding other really useful local context such as elevation (great for fuel consumption analysis) and local time-zone (maybe you want to just see locations recorded during working hours for a given location). Google Maps also provides an interface with which the majority of your staff, customers or users are familiar.

Finally we packaged our solution using Docker so that you can just take it and start working with it right away. (Of course if you’d rather just run the code on a server or your local machine you can do this as well; it’s written in Python and can be run from the command line.)

To get started, read the solution document, then head on over to the tutorial to explore the sample application and data. Once you’ve had a play, fork the code on GitHub and start working with your own telemetry data!

Your Google Cloud Platform compute options, explained

Tuesday, June 7, 2016

Posted by Alex Barrett, Editor, Google Cloud Platform Blog

When choosing to host an application on Google Cloud Platform, one of the first decisions organizations make is which compute offering to choose from: Google Compute Engine (GCE)? Google App Engine (GAE)? Google Container Engine (GKE)? There’s no right answer — it all depends on your developers’ preferences, what kind of functionality the application requires, and the use case.

Our new “Choosing a Computing Option” guide is a convenient way to visualize all these options at a glance, to help you make the best choice for your application. Then again, you may not want to choose. There’s nothing stopping you from choosing multiple compute options, across different application tiers.

And going a step further, if you’re comparing compute options across cloud providers, these resources will be helpful:

Learn how Google Cloud Platform services map to Amazon Web Services (AWS)
Check out Google Cloud Platform for AWS professionals: A guide designed to equip users familiar with AWS with the key concepts required to get started with Google Cloud Platform
Learn how Google Cloud Platform services map to Microsoft Azure services

If you’ve built an application that spans GCE, GAE and GKE or across other clouds, send us a note on Twitter @googlecloud. We’d love to hear more!

IAM best practice guides available now

Tuesday, March 29, 2016

Google Cloud Identity & Access Management (IAM) service gives you additional capabilities to secure access to your Google Cloud Platform resources. To assist you when designing your IAM strategy, we've created a set of best practice guides.

The best practices guides include:

The “Using IAM Securely” guide will help you to implement IAM controls securely by providing a checklist of best practices for the most common areas of concern when using IAM. It categorizes best practices into four sections:

Least privilege - A set of checks that assist you in restricting your users or applications to not do more than they're supposed to.
Managing Service Accounts and Service Account keys - Provides pointers to help you manage both securely.
Auditing - This covers practices that include reminding you to use Audit logs and cloud logging roles
Policy Management - Some checks to ensure that you're implementing and managing your policies appropriately.

Cloud Platform resources are organized hierarchically and IAM policies can propagate down the structure. You're able to set IAM policies at the following levels of the resource hierarchy:

Organization level. The Organization resource represents your company. IAM roles granted at this level are inherited by all resources under the organization.
Project level. Projects represent a trust boundary within your company. Services within the same project have a default level of trust. For example, App Engine instances can access Cloud storage buckets within the same project. IAM roles granted at the project level are inherited by resources within that project.
Resource level. In addition to the existing Google Cloud Storage and Google BigQuery ACL systems, additional resources such as Google Genomics Datasets and Google Cloud Pub/Sub topics support resource-level roles so that you can grant certain users permission to a single resource.

The diagram below illustrates an example of a Cloud Platform resource hierarchy:

The “Designing Resource Hierarchies” guide provides examples of what this means in practice and has a handy checklist to double-check that you're following best practice.

A Service Account is a special type of Google account that belongs to your application or a virtual machine (VM), instead of to an individual end user. The “Understanding Service Accounts” guide provides answers to the most common questions, like:

What resources can the service account access?
What permissions does it need?
Where will the code assuming the identity of the service account be running: on Google Cloud Platform or on-premises?

This guide discusses what the implications are of making certain decisions so that you have enough information to use Service Accounts safely and efficiently.

We’ll be producing more IAM best practice guides and are keen to hear from customers using IAM or wanting to use IAM on what additional content would be helpful. We’re also keen to hear if there are curated roles we haven’t thought of. We want Cloud Platform to be the most secure and the easiest cloud to use so your feedback is important to us and helps us shape our approach. Please share your feedback with us at:

GCP-iam-feedback@google.com

- Posted by Grace Mollison, Solutions Architect

TensorFlow machine learning with financial data on Google Cloud Platform

Wednesday, March 2, 2016

If you knew what happened in the London markets, how accurately could you predict what will happen in New York? It turns out, this is a great scenario to be tackled by machine learning!

The premise for this problem is that by following the sun and using data from markets that close earlier, such as London that closes 4.5 hours ahead of New York, you could more accurately predict market behaviors 7 out of 10 times.

We’ve published a new solution, TensorFlow Machine Learning with Financial Data on Google Cloud Platform, that looks at this problem. We hope you’ll enjoy exploring it with us interactively in the Google Cloud Datalab notebook we provide.

As you go through the solution, you’ll query six years of time series data for eight different markets using Google BigQuery, explore that data using Cloud Datalab, then produce two powerful TensorFlow models on Cloud Platform.

TensorFlow is Google’s next generation machine learning library, allowing you to build high performance, state-of-the-art, scalable deep learning models. Cloud Platform provides the compute and storage on demand required to build, train and test those models. The two together are a marriage made in heaven and can provide a tremendous force multiplier for your business.

This solution is intended to illustrate the capabilities of Cloud Platform and TensorFlow for fast, interactive, iterative data analysis and machine learning. It does not offer any advice on financial markets or trading strategies. The scenario presented in the tutorial is an example. Don't use this code to make investment decisions.

- Posted by Corrie Elston, Solutions Architect, Google Cloud Platform

How to build your own recommendation engine using machine learning on Google Compute Engine

Tuesday, March 1, 2016

You might like this blog post . . . if you like recommendation engines. If that sentence has a familiar ring, you've probably browsed many websites that use a recommendation engine.

Recommendation engines are the technology behind content discovery networks and the suggestion features of most ecommerce websites. They improve a visitor's experience by offering relevant items at the right time and on the right page. Adding that intelligence makes your application more attractive, enhances the customer experience and increases their satisfaction. Digital Trends studies show that 73% of customers prefer to get a personalized experience during their shopping experience.

There are various components to a recommendation engine, ranging from data ingestion and analytics to machine learning algorithms. In order to provide relevant recommendations, the system must be scalable and able to handle the demands that come with processing Big Data and must provide an easy way to improve the algorithms.

Recommendation engines, particularly the scalable ones that produce great suggestions, are highly compute-intensive workloads. The following features of Google Cloud Platform are well-suited to support this kind of workload:

Multi-tenant, redundant datacenters
Custom instance types
Global network
Years of experience in software infrastructure innovation, such as MapReduce, BigTable, Dremel, Flume, and Spanner.

Customers building recommendation engines are jumping on board. Antvoice uses Google Cloud Platform to deploy their self-learning, multi-channel, predictive recommendation platform.

This new solution article provides an introduction to implementing product recommendations. It shows you how you can use open source technologies to setup a basic recommendation engine on Cloud Platform. It uses the example of a house-renting website that suggests houses that the user might be interested in based on their previous behavior through a technique known as collaborative filtering.

To provide recommendations, whether in real time while customers browse your site, or through email later on, several things need to happen. At first, while you know little about your users' tastes and preferences, you might base recommendations on item attributes alone. But your system needs to be able to learn from your users, collecting data about their tastes and preferences.

Over time and with enough data, you can use machine learning algorithms to perform useful analysis and deliver meaningful recommendations. Other users’ inputs can also improve the results, effectively periodically retraining the system. This solution deals with a recommendations system that already has enough data to benefit from machine learning algorithms.

A recommendation engine typically processes data through the following four phases:

The following diagram represents the architecture of such a system:

Each component of this architecture can be deployed using various easy-to-implement technologies to get you started:

Front-End: By deploying a simple application on Google App Engine, a user can see a page with top recommendations. You can take it from there, easily building a strong and scalable web platform that can manage one to several millions of users, with minimum operations.
Storage: The solution uses Google Cloud SQL, our managed MySQL option. A commonly used database in the ecommerce domain, this database integrates well with MLlib, a machine learning library.
Machine learning: Using Google Cloud Dataproc or bdutil, two options that simplify deployment and management of Hadoop/Spark clusters, you'll deploy and run MLlib-based scripts.

The solution also discusses considerations for how to analyze the data, including:

Timeliness concerns, such as real-time, near-real-time, and batch data analysis. This information can help you understand your options for how quickly you can present recommendations to the user and what it takes to implement each option. The sample solution focuses mainly on a near-real-time approach.
Filtering methods, such as content-based, cluster and collaborative filtering. You'll need to decide exactly what information goes into making a recommendation, and these filtering methods are the common ones in use today. The sample solution focuses mainly on collaborative filtering, but a helpful appendix provides more information about the other options.

We hope that this solution will give you the nuts and bolts you need to build an intelligent and ever-improving application that makes the most of the information that your users give you. Happy reading!

If you liked this blog post . . . you can get started today following these steps:

Sign up for a free trial
Download and follow the instructions on the Google Cloud Platform Github page
“Recommend” this solution to your friends.

- Posted by Matthieu Mayran, Cloud Solutions Architect

How to build mobile apps on Google Cloud Platform

Wednesday, February 10, 2016

At some point in development, nearly every mobile app needs a backend service. With Google’s services you can rapidly build backend services that:

Scale automatically to meet demand
Automatically synchronize data across devices
Handle the offline case gracefully
Send notifications and messages

The following are design patterns you’ll find in Build mobile apps using Google Cloud Platform, which provides a side-by-side comparison of Google services, as well as links to tutorials and sample code. Click on a diagram for more information and links to sample code.

Real-time data synchronization with Firebase

Firebase is a fully managed platform for building iOS, Android and web apps that provides automatic data synchronization and authentication services.

To understand how using Firebase can simplify app development, consider a chat app. By storing the data in Firebase, you get the benefits of automatic synchronization of data across devices, minimal on-device storage, and an authentication service. All without having to write a backend service.

Add managed computation to Firebase apps with Google App Engine

If your app needs backend computation to process user data or orchestrate events, extending Firebase with App Engine gives you the benefit of automatic real-time data synchronization and an application platform that monitors, updates and scales the hosting environment.

An example of how you can use Firebase with App Engine is an app that implements a to-do list. Using Firebase to store the data ensures that the list is updated across devices. Connecting to your Firebase data from a backend service running on App Engine gives you the ability to process or act on that data. In the case of the to-do app, to send daily reminder emails.

Add flexible computation to Firebase with App Engine Managed VMs

If your mobile backend service needs to call native binaries, write to the file systems and make other system calls, extending Firebase with App Engine Managed VMs gives you the benefit of automatic real-time data synchronization and an application platform, with the flexibility to run code outside of the standard App Engine runtime.

Using Firebase and App Engine Managed VMs is similar to using Firebase with App Engine and adds additional options. For example, consider an app that converts chat messages into haikus using a pre-existing native binary. You can use Firebase to store and synchronize the data and connect to that data from a backend service running on App Engine Managed VMs. Your backend service can then detect new messages, call the native binaries to translate them into poetry, and push the new versions back to Firebase.

Automatically generate client libraries with App Engine and Google Cloud Endpoints

Using Cloud Endpoints means you don’t have to write wrappers to handle communication with App Engine. With the client libraries generated by Cloud Endpoints, you can simply make direct API calls from your mobile app.

If you're building an app that does not require real-time data synchronization, or if messaging and synchronization are already part of your backend service, using App Engine with Cloud Endpoints speeds development time by automatically generating client libraries. An example of an app where real-time synchronization is not needed is one that looks up information about retail products and finds nearby store locations.

Have full control with Compute Engine and REST or gRPC

With Google Compute Engine, you create and run virtual machines on Google infrastructure. You have administrator rights to the server and full control over its configuration.

If you have an existing backend service running on a physical or virtual machine, and that service requires a custom server configuration, moving your service to Compute Engine is the fastest way to get your code running on Cloud Platform. Keep in mind that you will be responsible for maintaining and updating your virtual machine.

An example of an app you might run on Compute Engine is an app with a backend service that uses third-party libraries and a custom server configuration.

For more information about these designs, as well as information about building your service, testing and monitoring your service and connecting to your service from your mobile app — including sending push notifications — see How to build backend services for mobile apps.

- Posted by Syne Mitchell, Technical Writer, Google Cloud Platform