Google Online Security Blog: November 2018

Announcing the Google Security and Privacy Research Awards

November 29, 2018

Posted by Elie Bursztein and Oxana Comanescu, Google Security and Privacy Group

We believe that cutting-edge research plays a key role in advancing the security and privacy of users across the Internet. While we do significant in-house research and engineering to protect users’ data, we maintain strong ties with academic institutions worldwide. We provide seed funding through faculty research grants, cloud credits to unlock new experiments, and foster active collaborations, including working with visiting scholars and research interns.

To accelerate the next generation of security and privacy breakthroughs, we recently created the Google Security and Privacy Research Awards program. These awards, selected via internal Google nominations and voting, recognize academic researchers who have made recent, significant contributions to the field.

We’ve been developing this program for several years. It began as a pilot when we awarded researchers for their work in 2016, and we expanded it more broadly for work from 2017. So far, we awarded $1 million dollars to 12 scholars. We are preparing the shortlist for 2018 nominees and will announce the winners next year. In the meantime, we wanted to highlight the previous award winners and the influence they’ve had on the field.
2017 Awardees

Lujo Bauer, Carnegie Mellon University
Research area: Password security and attacks against facial recognition

Dan Boneh, Stanford University
Research area: Enclave security and post-quantum cryptography

Aleksandra Korolova, University of Southern California
Research area: Differential privacy

Daniela Oliveira, University of Florida
Research area: Social engineering and phishing

Franziska Roesner, University of Washington
Research area: Usable security for augmented reality and at-risk populations

Matthew Smith, Universität Bonn
Research area: Usable security for developers

2016 Awardees

Michael Bailey, University of Illinois at Urbana-Champaign
Research area: Cloud and network security

Nicolas Christin, Carnegie Mellon University
Research area: Authentication and cybercrime

Damon McCoy, New York University
Research area: DDoS services and cybercrime

Stefan Savage, University of California San Diego
Research area: Network security and cybercrime

Marc Stevens, Centrum Wiskunde & Informatica
Research area: Cryptanalysis and lattice cryptography

Giovanni Vigna, University of California Santa Barbara
Research area: Malware detection and cybercrime

Congratulations to all of our award winners.

Industry collaboration leads to takedown of the “3ve” ad fraud operation

November 27, 2018

Posted by Per Bjorke, Product Manager, Ad Traffic Quality

For years, Google has been waging a comprehensive, global fight against invalid traffic through a combination of technology, policy, and operations teams to protect advertisers and publishers and increase transparency throughout the advertising industry.

Last year, we identified one of the most complex and sophisticated ad fraud operations we have seen to date, working with cyber security firm White Ops, and referred the case to law enforcement. Today, the U.S. Attorney’s Office for the Eastern District of New York announced criminal charges associated with this fraud operation. This takedown marks a major milestone in the industry’s fight against ad fraud, and we’re proud to have been a key contributor.

In partnership with White Ops, we have published a white paper about how we identified this ad fraud operation, the steps we took to protect our clients from being impacted, and the technical work we did to detect patterns across systems in the industry. Below are some of the highlights from the white paper, which you can download here.

All about 3ve: A creative and sophisticated threat

Referred to as 3ve (pronounced “Eve”), this ad fraud operation evolved over the course of 2017 from a modest, low-level botnet into a large and sophisticated operation that used a broad set of tactics to commit ad fraud. 3ve operated on a significant scale: At its peak, it controlled over 1 million IPs from both residential malware infections and corporate IP spaces primarily in North America and Europe.

Through our investigation, we discovered that 3ve was comprised of three unique sub-operations that evolved rapidly, using sophisticated tactics aimed at exploiting data centers, computers infected with malware, spoofed fraudulent domains, and fake websites. Through its varied and complex machinery, 3ve generated billions of fraudulent ad bid requests (i.e., ad spaces on web pages that advertisers can bid to purchase in an automated way), and it also created thousands of spoofed fraudulent domains. It should be noted that our analysis of ad bid requests indicated growth in activity, but not necessarily growth in transactions that would result in charges to advertisers. It’s also worth noting that 3+ billion daily ad bid requests made 3ve an extremely large ad fraud operation, but its bid request volume was only a small percentage of overall bid request volume across the industry.

Our objective

Trust and integrity are critical to the digital advertising ecosystem. Investments in our ad traffic quality systems made it possible for us to tackle this ad fraud operation and to limit the impact it had on our clients as quickly as possible, including crediting advertisers.

3ve’s focus, like many ad fraud schemes, was not a single player or system, but rather the whole advertising ecosystem. As we worked to protect our ad systems against traffic from this threat, we identified that others also had observed this traffic, and we partnered with them to help remove the threat from the ecosystem. The working group, which included nearly 20 partners, was a key component that shaped our broader investigation into 3ve, enabling us to engage directly with each other and to work towards a mutually beneficial outcome.

Industry collaboration helps bring 3ve down

While ad fraud traditionally has been seen as a faceless crime in which bad actors don’t face much risk of being identified or consequences for their actions, 3ve’s takedown demonstrates that there are risks and consequences to committing ad fraud. We’re confident that our collective efforts are building momentum and moving us closer to finding a resolution to this challenge.

For example, industry initiatives such as the Interactive Advertising Bureau (IAB) Tech Lab’s ads.txt standard, which has experienced and continues to see very rapid adoption (over 620,000 domains have an ads.txt), as well as the increasing number of buy-side platforms and exchanges offering refunds for invalid traffic, are valuable steps towards cutting off the money flow to fraudsters. As we announced last year, we’ve made, and will continue to make investments in our automated refunds for invalid traffic, including our work with supply partners to provide advertisers with refunds for invalid traffic detected up to 30 days after monthly billing.

Industry bodies such as the IAB, Trustworthy Accountability Group (TAG), Media Rating Council, and the Joint Industry Committee for Web Standards, who are serving as agents of change and collaboration across our industry, are instrumental in the fight against ad fraud. We have a long history of working with these bodies, including ongoing participation in TAG and IAB leadership and working groups, as well as our inclusion in the TAG Certified Against Fraud program. That program’s value was reinforced with the IAB’s requirement that all members need to be TAG certified by the middle of this year.

Successful disruption

A coordinated takedown of infrastructure related to 3ve’s operations occurred recently. The takedown involved disrupting as much of the related infrastructure as possible to make it hard to rebuild any of 3ve’s operations. As the graph below demonstrates, declining volumes in invalid traffic indicate that the disruption thus far has been successful, bringing the bid request traffic close to zero within 18 hours of starting the coordinated takedown.

Looking ahead

We’ll continue to be vigilant, working to protect marketers, publishers, and users, while continuing to collaborate with the broader industry to safeguard the integrity of the digital advertising ecosystem that powers the open web. Our work to take down 3ve is another example of our collaboration with the broader ecosystem to improve trust in digital advertising. We are committed to helping to create a better digital advertising ecosystem — one that is more valuable, transparent, and trusted for everyone.

Combating Potentially Harmful Applications with Machine Learning at Google: Datasets and Models

November 15, 2018

Posted by Mo Yu, Damien Octeau, and Chuangang Ren, Android Security & Privacy Team

[Cross-posted from the Android Developers Blog]

In a previous blog post, we talked about using machine learning to combat Potentially Harmful Applications (PHAs). This blog post covers how Google uses machine learning techniques to detect and classify PHAs. We'll discuss the challenges in the PHA detection space, including the scale of data, the correct identification of PHA behaviors, and the evolution of PHA families. Next, we will introduce two of the datasets that make the training and implementation of machine learning models possible, such as app analysis data and Google Play data. Finally, we will present some of the approaches we use, including logistic regression and deep neural networks.

Using Machine Learning to Scale

Detecting PHAs is challenging and requires a lot of resources. Our security experts need to understand how apps interact with the system and the user, analyze complex signals to find PHA behavior, and evolve their tactics to stay ahead of PHA authors. Every day, Google Play Protect (GPP) analyzes over half a million apps, which makes a lot of new data for our security experts to process.

Leveraging machine learning helps us detect PHAs faster and at a larger scale. We can detect more PHAs just by adding additional computing resources. In many cases, machine learning can find PHA signals in the training data without human intervention. Sometimes, those signals are different than signals found by security experts. Machine learning can take better advantage of this data, and discover hidden relationships between signals more effectively.

There are two major parts of Google Play Protect's machine learning protections: the data and the machine learning models.

Data Sources

The quality and quantity of the data used to create a model are crucial to the success of the system. For the purpose of PHA detection and classification, our system mainly uses two anonymous data sources: data from analyzing apps and data from how users experience apps.

App Data

Google Play Protect analyzes every app that it can find on the internet. We created a dataset by decomposing each app's APK and extracting PHA signals with deep analysis. We execute various processes on each app to find particular features and behaviors that are relevant to the PHA categories in scope (for example, SMS fraud, phishing, privilege escalation). Static analysis examines the different resources inside an APK file while dynamic analysis checks the behavior of the app when it's actually running. These two approaches complement each other. For example, dynamic analysis requires the execution of the app regardless of how obfuscated its code is (obfuscation hinders static analysis), and static analysis can help detect cloaking attempts in the code that may in practice bypass dynamic analysis-based detection. In the end, this analysis produces information about the app's characteristics, which serve as a fundamental data source for machine learning algorithms.

Google Play Data

In addition to analyzing each app, we also try to understand how users perceive that app. User feedback (such as the number of installs, uninstalls, user ratings, and comments) collected from Google Play can help us identify problematic apps. Similarly, information about the developer (such as the certificates they use and their history of published apps) contribute valuable knowledge that can be used to identify PHAs. All these metrics are generated when developers submit a new app (or new version of an app) and by millions of Google Play users every day. This information helps us to understand the quality, behavior, and purpose of an app so that we can identify new PHA behaviors or identify similar apps.

In general, our data sources yield raw signals, which then need to be transformed into machine learning features for use by our algorithms. Some signals, such as the permissions that an app requests, have a clear semantic meaning and can be directly used. In other cases, we need to engineer our data to make new, more powerful features. For example, we can aggregate the ratings of all apps that a particular developer owns, so we can calculate a rating per developer and use it to validate future apps. We also employ several techniques to focus in on interesting data.To create compact representations for sparse data, we use embedding. To help streamline the data to make it more useful to models, we use feature selection. Depending on the target, feature selection helps us keep the most relevant signals and remove irrelevant ones.

By combining our different datasets and investing in feature engineering and feature selection, we improve the quality of the data that can be fed to various types of machine learning models.

Models

Building a good machine learning model is like building a skyscraper: quality materials are important, but a great design is also essential. Like the materials in a skyscraper, good datasets and features are important to machine learning, but a great algorithm is essential to identify PHA behaviors effectively and efficiently.

We train models to identify PHAs that belong to a specific category, such as SMS-fraud or phishing. Such categories are quite broad and contain a large number of samples given the number of PHA families that fit the definition. Alternatively, we also have models focusing on a much smaller scale, such as a family, which is composed of a group of apps that are part of the same PHA campaign and that share similar source code and behaviors. On the one hand, having a single model to tackle an entire PHA category may be attractive in terms of simplicity but precision may be an issue as the model will have to generalize the behaviors of a large number of PHAs believed to have something in common. On the other hand, developing multiple PHA models may require additional engineering efforts, but may result in better precision at the cost of reduced scope.

We use a variety of modeling techniques to modify our machine learning approach, including supervised and unsupervised ones.

One supervised technique we use is logistic regression, which has been widely adopted in the industry. These models have a simple structure and can be trained quickly. Logistic regression models can be analyzed to understand the importance of the different PHA and app features they are built with, allowing us to improve our feature engineering process. After a few cycles of training, evaluation, and improvement, we can launch the best models in production and monitor their performance.

For more complex cases, we employ deep learning. Compared to logistic regression, deep learning is good at capturing complicated interactions between different features and extracting hidden patterns. The millions of apps in Google Play provide a rich dataset, which is advantageous to deep learning.

In addition to our targeted feature engineering efforts, we experiment with many aspects of deep neural networks. For example, a deep neural network can have multiple layers and each layer has several neurons to process signals. We can experiment with the number of layers and neurons per layer to change model behaviors.

We also adopt unsupervised machine learning methods. Many PHAs use similar abuse techniques and tricks, so they look almost identical to each other. An unsupervised approach helps define clusters of apps that look or behave similarly, which allows us to mitigate and identify PHAs more effectively. We can automate the process of categorizing that type of app if we are confident in the model or can request help from a human expert to validate what the model found.

PHAs are constantly evolving, so our models need constant updating and monitoring. In production, models are fed with data from recent apps, which help them stay relevant. However, new abuse techniques and behaviors need to be continuously detected and fed into our machine learning models to be able to catch new PHAs and stay on top of recent trends. This is a continuous cycle of model creation and updating that also requires tuning to ensure that the precision and coverage of the system as a whole matches our detection goals.

Looking forward

As part of Google's AI-first strategy, our work leverages many machine learning resources across the company, such as tools and infrastructures developed by Google Brain and Google Research. In 2017, our machine learning models successfully detected 60.3% of PHAs identified by Google Play Protect, covering over 2 billion Android devices. We continue to research and invest in machine learning to scale and simplify the detection of PHAs in the Android ecosystem.

Acknowledgements

This work was developed in joint collaboration with Google Play Protect, Safe Browsing and Play Abuse teams with contributions from Andrew Ahn, Hrishikesh Aradhye, Daniel Bali, Hongji Bao, Yajie Hu, Arthur Kaiser, Elena Kovakina, Salvador Mandujano, Melinda Miller, Rahul Mishra, Sebastian Porst, Monirul Sharif, Sri Somanchi, Sai Deep Tetali, and Zhikun Wang.

Introducing the Android Ecosystem Security Transparency Report

November 8, 2018

Posted by Jason Woloz and Eugene Liderman, Android Security & Privacy Team

Update: We identified a bug that affected how we calculated data from Q3 2018 in the Transparency Report. This bug created inconsistencies between the data in the report and this blog post. The data points in this blog post have been corrected.

As shared during the What's new in Android security session at Google I/O 2018, transparency and openness are important parts of Android's ethos. We regularly blog about new features and enhancements and publish an annual Android Security Year in Review, which highlights Android ecosystem trends. To provide more frequent insights, we're introducing a quarterly Android Ecosystem Security Transparency Report. This report is the latest addition to our Transparency Report site, which began in 2010 to show how the policies and actions of governments and corporations affect privacy, security, and access to information online.

This Android Ecosystem Security Transparency Report covers how often a routine, full-device scan by Google Play Protect detects a device with PHAs installed. Google Play Protect is built-in protection on Android devices that scans over 50 billion apps daily from inside and outside of Google Play. These scans look for evidence of Potentially Harmful Applications (PHAs). If the scans find a PHA, Google Play Protect warns the user and can disable or remove PHAs. In Android's first annual Android Security Year in Review from 2014, fewer than 1% of devices had PHAs installed. The percentage has declined steadily over time and this downward trend continues through 2018. The transparency report covers PHA rates in three areas: market segment (whether a PHA came from Google Play or outside of Google Play), Android version, and country.

Devices with Potentially Harmful Applications installed by market segment

Google works hard to protect your Android device: no matter where your apps come from. Continuing the trend from previous years, Android devices that only download apps from Google Play are 9 times less likely to get a PHA than devices that download apps from other sources. Before applications become available in Google Play they undergo an application review to confirm they comply with Google Play policies. Google uses a risk scorer to analyze apps to detect potentially harmful behavior. When Google’s application risk analyzer discovers something suspicious, it flags the app and refers the PHA to a security analyst for manual review if needed. We also scan apps that users download to their device from outside of Google Play. If we find a suspicious app, we also protect users from that—even if it didn't come from Google Play.

In the Android Ecosystem Security Transparency Report, the Devices with Potentially Harmful Applications installed by market segment chart shows the percentage of Android devices that have one or more PHAs installed over time. The chart has two lines: PHA rate for devices that exclusively install from Google Play and PHA rate for devices that also install from outside of Google Play. In 2017, on average 0.09% of devices that exclusively used Google Play had one or more PHAs installed. The first three quarters in 2018 averaged a lower PHA rate of 0.08%.

The security of devices that installed apps from outside of Google Play also improved. In 2017, ~0.82% of devices that installed apps from outside of Google Play were affected by PHA; in the first three quarters of 2018, ~0.68% were affected. Since 2017, we've reduced this number by expanding the auto-disable feature which we covered on page 10 in the 2017 Year in Review. While malware rates fluctuate from quarter to quarter, our metrics continue to show a consistent downward trend over time. We'll share more details in our 2018 Android Security Year in Review in early 2019.

Devices with Potentially Harmful Applications installed by Android version

Newer versions of Android are less affected by PHAs. We attribute this to many factors, such as continued platform and API hardening, ongoing security updates and app security and developer training to reduce apps' access to sensitive data. In particular, newer Android versions—such as Nougat, Oreo, and Pie—are more resilient to privilege escalation attacks that had previously allowed PHAs to gain persistence on devices and protect themselves against removal attempts. The Devices with Potentially Harmful Applications installed by Android version chart shows the percentage of devices with a PHA installed, sorted by the Android version that the device is running.

Devices with Potentially Harmful Applications rate by top 10 countries

Overall, PHA rates in the ten largest Android markets have remained steady. While these numbers fluctuate on a quarterly basis due to the fluidity of the marketplace, we intend to provide more in depth coverage of what drove these changes in our annual Year in Review in Q1, 2019.

The Devices with Potentially Harmful Applications rate by top 10 countries chart shows the percentage of devices with at least one PHA in the ten countries with the highest volume of Android devices. India saw the most significant decline in PHAs present on devices, with the average rate of infection dropping by 34 percent. Indonesia, Mexico, and Turkey also saw a decline in the likelihood of PHAs being present on devices in the region. South Korea saw the lowest number of devices containing PHA, with only 0.1%.

Check out the report

Over time, we'll add more insights into the health of the ecosystem to the Android Ecosystem Security Transparency Report. If you have any questions about terminology or the products referred to in this report please review the FAQs section of the Transparency Report. In the meantime, check out our new blog post and video outlining Android’s performance in Gartner’s Mobile OSs and Device Security: A Comparison of Platforms report.

A New Chapter for OSS-Fuzz

November 6, 2018

Posted by Matt Ruhstaller, TPM and Oliver Chang, Software Engineer, Google Security Team

Open Source Software (OSS) is extremely important to Google, and we rely on OSS in a variety of customer-facing and internal projects. We also understand the difficulty and importance of securing the open source ecosystem, and are continuously looking for ways to simplify it.

For the OSS community, we currently provide OSS-Fuzz, a free continuous fuzzing infrastructure hosted on the Google Cloud Platform. OSS-Fuzz uncovers security vulnerabilities and stability issues, and reports them directly to developers. Since launching in December 2016, OSS-Fuzz has reported over 9,000 bugs directly to open source developers.

In addition to OSS-Fuzz, Google's security team maintains several internal tools for identifying bugs in both Google internal and Open Source code. Until recently, these issues were manually reported to various public bug trackers by our security team and then monitored until they were resolved. Unresolved bugs were eligible for the Patch Rewards Program. While this reporting process had some success, it was overly complex. Now, by unifying and automating our fuzzing tools, we have been able to consolidate our processes into a single workflow, based on OSS-Fuzz. Projects integrated with OSS-Fuzz will benefit from being reviewed by both our internal and external fuzzing tools, thereby increasing code coverage and discovering bugs faster.

We are committed to helping open source projects benefit from integrating with our OSS-Fuzz fuzzing infrastructure. In the coming weeks, we will reach out via email to critical projects that we believe would be a good fit and support the community at large. Projects that integrate are eligible for rewards ranging from $1,000 (initial integration) up to $20,000 (ideal integration); more details are available here. These rewards are intended to help offset the cost and effort required to properly configure fuzzing for OSS projects. If you would like to integrate your project with OSS-Fuzz, please submit your project for review. Our goal is to admit as many OSS projects as possible and ensure that they are continuously fuzzed.

Once contacted, we might provide a sample fuzz target to you for easy integration. Many of these fuzz targets are generated with new technology that understands how library APIs are used appropriately. Watch this space for more details on how Google plans to further automate fuzz target creation, so that even more open source projects can benefit from continuous fuzzing.

Thank you for your continued contributions to the Open Source community. Let’s work together on a more secure and stable future for Open Source Software.

Security Blog

Announcing the Google Security and Privacy Research Awards

Industry collaboration leads to takedown of the “3ve” ad fraud operation

Combating Potentially Harmful Applications with Machine Learning at Google: Datasets and Models

Introducing the Android Ecosystem Security Transparency Report

A New Chapter for OSS-Fuzz

Labels

Archive

Feed