Google Online Security Blog

Hacking for Defenders: approaches to DARPA’s AI Cyber Challenge

June 25, 2024

Oliver Chang, Jonathan Metzman, OSS-Fuzz and Alex Rebert, Security Engineering

The US Defense Advanced Research Projects Agency, DARPA, recently kicked off a two-year AI Cyber Challenge (AIxCC), inviting top AI and cybersecurity experts to design new AI systems to help secure major open source projects which our critical infrastructure relies upon. As AI continues to grow, it’s crucial to invest in AI tools for Defenders, and this competition will help advance technology to do so.

Google’s OSS-Fuzz and Security Engineering teams have been excited to assist AIxCC organizers in designing their challenges and competition framework. We also playtested the competition by building a Cyber Reasoning System (CRS) tackling DARPA’s exemplar challenge.

This blog post will share our approach to the exemplar challenge using open source technology found in Google’s OSS-Fuzz, highlighting opportunities where AI can supercharge the platform’s ability to find and patch vulnerabilities, which we hope will inspire innovative solutions from competitors.

Leveraging OSS-Fuzz

AIxCC challenges focus on finding and fixing vulnerabilities in open source projects. OSS-Fuzz, our fuzz testing platform, has been finding vulnerabilities in open source projects as a public service for years, resulting in over 11,000 vulnerabilities found and fixed across 1200+ projects. OSS-Fuzz is free, open source, and its projects and infrastructure are shaped very similarly to AIxCC challenges. Competitors can easily reuse its existing toolchains, fuzzing engines, and sanitizers on AIxCC projects. Our baseline Cyber Reasoning System (CRS) mainly leverages non-AI techniques and has some limitations. We highlight these as opportunities for competitors to explore how AI can advance the state of the art in fuzz testing.

Fuzzing the AIxCC challenges

For userspace Java and C/C++ challenges, fuzzing with engines such as libFuzzer, AFL(++), and Jazzer is straightforward because they use the same interface as OSS-Fuzz.

Fuzzing the kernel is trickier, so we considered two options:

Syzkaller, an unsupervised coverage guided kernel fuzzer
A general purpose coverage guided fuzzer, such as AFL

Syzkaller has been effective at finding Linux kernel vulnerabilities, but is not suitable for AIxCC because Syzkaller generates sequences of syscalls to fuzz the whole Linux kernel, while AIxCC kernel challenges (exemplar) come with a userspace harness to exercise specific parts of the kernel.

Instead, we chose to use AFL, which is typically used to fuzz userspace programs. To enable kernel fuzzing, we followed a similar approach to an older blog post from Cloudflare. We compiled the kernel with KCOV and KSAN instrumentation and ran it virtualized under QEMU. Then, a userspace harness acts as a fake AFL forkserver, which executes the inputs by executing the sequence of syscalls to be fuzzed.

After every input execution, the harness read the KCOV coverage and stored it in AFL’s coverage counters via shared memory to enable coverage-guided fuzzing. The harness also checked the kernel dmesg log after every run to discover whether or not the input caused a KASAN sanitizer to trigger.

Some changes to Cloudflare’s harness were required in order for this to be pluggable with the provided kernel challenges. We needed to turn the harness into a library/wrapper that could be linked against arbitrary AIxCC kernel harnesses.

AIxCC challenges come with their own main() which takes in a file path. The main() function opens and reads this file, and passes it to the harness() function, which takes in a buffer and size representing the input. We made our wrapper work by wrapping the main() during compilation via $CC -Wl,--wrap=main harness.c harness_wrapper.a

The wrapper starts by setting up KCOV, the AFL forkserver, and shared memory. The wrapper also reads the input from stdin (which is what AFL expects by default) and passes it to the harness() function in the challenge harness.

Because AIxCC's harnesses aren't within our control and may misbehave, we had to be careful with memory or FD leaks within the challenge harness. Indeed, the provided harness has various FD leaks, which means that fuzzing it will very quickly become useless as the FD limit is reached.

To address this, we could either:

Forcibly close FDs created during the running of harness by checking for newly created FDs via /proc/self/fd before and after the execution of the harness, or
Just fork the userspace harness by actually forking in the forkserver.

The first approach worked for us. The latter is likely most reliable, but may worsen performance.

All of these efforts enabled afl-fuzz to fuzz the Linux exemplar, but the vulnerability cannot be easily found even after hours of fuzzing, unless provided with seed inputs close to the solution.

Improving fuzzing with AI

This limitation of fuzzing highlights a potential area for competitors to explore AI’s capabilities. The input format being complicated, combined with slow execution speeds make the exact reproducer hard to discover. Using AI could unlock the ability for fuzzing to find this vulnerability quickly—for example, by asking an LLM to generate seed inputs (or a script to generate them) close to expected input format based on the harness source code. Competitors might find inspiration in some interesting experiments done by Brendan Dolan-Gavitt from NYU, which show promise for this idea.

Another approach: static analysis

One alternative to fuzzing to find vulnerabilities is to use static analysis. Static analysis traditionally has challenges with generating high amounts of false positives, as well as difficulties in proving exploitability and reachability of issues it points out. LLMs could help dramatically improve bug finding capabilities by augmenting traditional static analysis techniques with increased accuracy and analysis capabilities.

Proof of understanding (PoU)Once fuzzing finds a reproducer, we can produce key evidence required for the PoU:

The culprit commit, which can be found from git history bisection.
The expected sanitizer, which can be found by running the reproducer to get the crash and parsing the resulting stacktrace.

Next step: “patching” via delta debugging

Once the culprit commit has been identified, one obvious way to “patch” the vulnerability is to just revert this commit. However, the commit may include legitimate changes that are necessary for functionality tests to pass. To ensure functionality doesn’t break, we could apply delta debugging: we progressively try to include/exclude different parts of the culprit commit until both the vulnerability no longer triggers, yet all functionality tests still pass.

This is a rather brute force approach to “patching.” There is no comprehension of the code being patched and it will likely not work for more complicated patches that include subtle changes required to fix the vulnerability without breaking functionality.

Improving patching with AI

These limitations highlight a second area for competitors to apply AI’s capabilities. One approach might be to use an LLM to suggest patches. A 2024 whitepaper from Google walks through one way to build an LLM-based automated patching pipeline.

Competitors will need to address the following challenges:

Validating the patches by running crashes and tests to ensure the crash was prevented and the functionality was not impacted
Narrowing prompts to include only the functions present in the crashing stack trace, to fit prompt limitations
Building a validation step to filter out invalid patches

Using an LLM agent is likely another promising approach, where competitors could combine an LLM’s generation capabilities with the ability to compile and receive debug test failures or stacktraces iteratively.

Advancing security for everyoneCollaboration is essential to harness the power of AI as a widespread tool for defenders. As advancements emerge, we’ll integrate them into OSS-Fuzz, meaning that the outcomes from AIxCC will directly improve security for the open source ecosystem. We’re looking forward to the innovative solutions that result from this competition!

Staying Safe with Chrome Extensions

June 20, 2024

Posted by Benjamin Ackerman, Anunoy Ghosh and David Warren, Chrome Security Team

Chrome extensions can boost your browsing, empowering you to do anything from customizing the look of sites to providing personalized advice when you’re planning a vacation. But as with any software, extensions can also introduce risk.

That’s why we have a team whose only job is to focus on keeping you safe as you install and take advantage of Chrome extensions. Our team:

Provides you with a personalized summary of the extensions you’ve installed
Reviews extensions before they’re published on the Chrome Web Store
Continuously monitors extensions after they’re published

A summary of your extensions

The top of the extensions page (chrome://extensions) warns you of any extensions you have installed that might pose a security risk. (If you don’t see a warning panel, you probably don’t have any extensions you need to worry about.) The panel includes:

Extensions suspected of including malware
Extensions that violate Chrome Web Store policies
Extensions that have been unpublished by a developer, which might indicate that an extension is no longer supported
Extensions that aren’t from the Chrome Web Store
Extensions that haven’t published what they do with data they collect and other privacy practices

You’ll get notified when Chrome’s Safety Check has recommendations for you or you can check on your own by running Safety Check. Just type “run safety check” in Chrome’s address bar and select the corresponding shortcut: “Go to Chrome safety check.”

User flow of removing extensions highlighted by Safety Check.

Besides the Safety Check, you can visit the extensions page directly in a number of ways:

Navigate to chrome://extensions
Click the puzzle icon and choose “Manage extensions”
Click the More choices menu and choose menu > Extensions > Manage Extensions

Reviewing extensions before they’re published

Before an extension is even accessible to install from the Chrome Web Store, we have two levels of verification to ensure an extension is safe:

An automated review: Each extension gets examined by our machine-learning systems to spot possible violations or suspicious behavior.
A human review: Next, a team member examines the images, descriptions, and public policies of each extension. Depending on the results of both the automated and manual review, we may perform an even deeper and more thorough review of the code.

This review process weeds out the overwhelming majority of bad extensions before they even get published. In 2024, less than 1% of all installs from the Chrome Web Store were found to include malware. We're proud of this record and yet some bad extensions still get through, which is why we also monitor published extensions.

Monitoring published extensions

The same Chrome team that reviews extensions before they get published also reviews extensions that are already on the Chrome Web Store. And just like the pre-check, this monitoring includes both human and machine reviews. We also work closely with trusted security researchers outside of Google, and even pay researchers who report possible threats to Chrome users through our Developer Data Protection Rewards Program.

What about extensions that get updated over time, or are programmed to execute malicious code at a later date? Our systems monitor for that as well, by periodically reviewing what extensions are actually doing and comparing that to the stated objectives defined by each extension in the Chrome Web Store.

If the team finds that an extension poses a severe risk to Chrome users, it’s immediately remove from the Chrome Web Store and the extension gets disabled on all browsers that have it installed.

The extensions page highlights when you have a potentially unsafe extension downloaded

Others steps you can take to stay safe

Review new extensions before installing them

The Chrome Web Store provides useful information about each extension and its developer. The following information should help you decide whether it’s safe to install an extension:

Verified and featured badges are awarded by the Chrome team to extensions that follow our technical best practices and meet a high standard of user experience and design
Ratings and reviews from our users
Information about the developer
Privacy practices, including information about how an extension handles your data

Be careful of sites that try to quickly persuade you to install extensions, especially if the site has little in common with the extension.

Review extensions you’ve already installed

Even though Safety Check and your Extensions page (chrome://extensions) warn you of extensions that might pose a risk, it’s still a good idea to review your extensions from time to time.

Uninstall extensions that you no longer use.
Review the description of an extension in the Chrome Web Store, considering the extension’s ratings, reviews, and privacy practices — reviews can change over time.
Compare an extension’s stated goals with 1) the permissions requested by an extension and 2) the privacy practices published by the extension. If requested permissions don’t align with stated goals, consider uninstalling the extension.
Limit the sites an extension has permission to work on.

Enable Enhanced Protection

The Enhanced protection mode of Safe Browsing is Chrome’s highest level of protection that we offer. Not only does this mode provide you with the best protections against phishing and malware, but it also provides additional features targeted to keep you safe against potentially harmful extensions. Threats are constantly evolving and Safe Browsing’s Enhanced protection mode is the best way to ensure that you have the most advanced security features in Chrome. This can be enabled from the Safe Browsing settings page in Chrome (chrome://settings/security) and selecting “Enhanced”.

Time to challenge yourself in the 2024 Google CTF

June 12, 2024

Hlynur Gudmundsson, Software Engineer

It’s Google CTF time! Install your tools, commit your scripts, and clear your schedule. The competition kicks off on June 21 2024 6:00 PM UTC and runs through June 23 2024 6:00 PM UTC. Registration is now open at goo.gle/ctf.

Join the Google CTF (at goo.gle/ctf), a thrilling arena to showcase your technical prowess. The Google CTF consists of a set of computer security puzzles (or challenges) involving reverse-engineering, memory corruption, cryptography, web technologies, and more. Participants can use obscure security knowledge to find exploits through bugs and creative misuse, and with each completed challenge your team will earn points and move up through the ranks.

The top 8 teams of the Google CTF will qualify for our Hackceler8 competition taking place in Málaga, Spain later this year as a part of our larger Escal8 event. Hackceler8 is our experimental esport-style hacking game competition, custom-made to mix CTF and speedrunning.

Screenshot from last year’s Hackceler8 game

In the competition, teams need to find clever ways to abuse the game features to capture flags as quickly as possible.

Last year, teams assumed the role of Bartholomew (Mew for short), the fuzzy and adorable protagonist of Hackceler8 2023, set to defeat and overcome the evil rA.Ibbit taking over Silicon Valley! What adventures will Mew encounter this year? See the 2023 grand final to get a sense of the story and gameplay. The prize pool for this year’s Google CTF and Hackceler8 stands at more than $32,000.

Itching to get started early? Want to learn more, or get a leg up on the competition? Review challenges from previous years, including previous Hackceler8 matches, all open-sourced here. Or gain inspiration by binge watching hours of Hackceler8 2023 videos!

If you are just starting out in this space, check out our documentary H4CK1NG GOOGLE, it’s a great way to get acquainted with security. We also recommend checking out this year’s Beginner’s Quest that’ll be launching later this summer which will teach you some of the tools and tricks with simpler gamified challenges. For example, last year we explored hacking through time – you can use this to prepare for what’s yet to come.

Whether you’re a seasoned CTF player or just curious about cybersecurity and ethical hacking, we want to invite you to join us. Sign up for the Google CTF to expand your skill set, meet new friends in the security community, and even watch the pros in action. For the latest announcements, see goo.gle/ctf, subscribe to our mailing list, or follow us on Twitter @GoogleVRP. Interested in bug hunting for Google? Check out bughunters.google.com. See you there!

On Fire Drills and Phishing Tests

May 22, 2024

Matt Linton, Chaos Specialist

In the late 19th and early 20th century, a series of catastrophic fires in short succession led an outraged public to demand action from the budding fire protection industry. Among the experts, one initial focus was on “Fire Evacuation Tests”. The earliest of these tests focused on individual performance and tested occupants on their evacuation speed, sometimes performing the tests “by surprise” as though the fire drill were a real fire. These early tests were more likely to result in injuries to the test-takers than any improvement in survivability. It wasn’t until introducing better protective engineering - wider doors, push bars at exits, firebreaks in construction, lighted exit signs, and so on - that survival rates from building fires began to improve. As protections evolved over the years and improvements like mandatory fire sprinklers became required in building code, survival rates have continued to improve steadily, and “tests” have evolved into announced, advanced training and posted evacuation plans.

In this blog, we will analyze the modern practice of Phishing “Tests” as a cybersecurity control as it relates to industry-standard fire protection practices.

Modern “Phishing tests” strongly resemble the early “Fire tests”

Google currently operates under regulations (for example, FedRAMP in the USA) that require us to perform annual “Phishing Tests.” In these mandatory tests, the Security team creates and sends phishing emails to Googlers, counts how many interact with the email, and educates them on how to “not be fooled” by phishing. These exercises typically collect reporting metrics on sent emails and how many employees “failed” by clicking the decoy link. Usually, further education is required for employees who fail the exercise. Per the FedRAMP pen-testing guidance doc: “Users are the last line of defense and should be tested.”

These tests resemble the first “evacuation tests” that building occupants were once subjected to. They require individuals to recognize the danger, react individually in an ‘appropriate’ way, and are told that any failure is an individual failure on their part rather than a systemic issue. Worse, FedRAMP guidance requires companies to bypass or eliminate all systematic controls during the tests to ensure the likelihood of a person clicking on a phishing link is artificially maximized.

Among the harmful side effects of these tests:

There is no evidence that the tests result in fewer incidences of successful phishing campaigns;

Phishing (or more generically social engineering) remains a top vector for attackers establishing footholds at companies.
Research shows that these tests do not effectively prevent people from being fooled. This study with 14,000 participants showed a counterproductive effect of phishing tests, showing that “repeat clickers” will consistently fail tests despite recent interventions.

Some (e.g, FedRAMP) phishing tests require bypassing existing anti-phishing defenses. This creates an inaccurate perception of actual risks, allows penetration testing teams to avoid having to mimic actual modern attacker tactics, and creates a risk that the allowlists put in place to facilitate the test could be accidentally left in place and reused by attackers.
There has been a significantly increased load on Detection and Incident Response (D&R) teams during these tests, as users saturate them with thousands of needless reports.
Employees are upset by them and feel security is “tricking them”, which degrades the trust with our users that is necessary for security teams to make meaningful systemic improvements and when we need employees to take timely actions related to actual security events.
At larger enterprises with multiple independent products, people can end up with numerous overlapping required phishing tests, causing repeated burdens.

But are users the last line of defense?

Training humans to avoid phishing or social engineering with a 100% success rate is a likely impossible task. There is value in teaching people how to spot phishing and social engineering so they can alert security to perform incident response. By ensuring that even a single user reports attacks in progress, companies can activate full-scope responses which are a worthwhile defensive control that can quickly mitigate even advanced attacks. But, much like the Fire Safety professional world has moved to regular pre-announced evacuation training instead of surprise drills, the information security industry should move toward training that de-emphasizes surprises and tricks and instead prioritizes accurate training of what we want staff to do the moment they spot a phishing email - with a particular focus on recognizing and reporting the phishing threat.

In short - we need to stop doing phishing tests and start doing phishing fire drills.

A “phishing fire drill” would aim to accomplish the following:

Educate our users about how to spot phishing emails
Inform the users on how to report phishing emails
Allow employees to practice reporting a phishing email in the manner that we would prefer, and
Collect useful metrics for auditors, such as:

The number of users who completed the practice exercise of reporting the email as a phishing email
The time between the email opening and the first report of phishing
Time of first escalation to the security team (and time delta)
Number of reports at 1 hour, 4 hours, 8 hours, and 24 hours post-delivery

Example

When performing a phishing drill, someone would send an email announcing itself as a phishing email and with relevant instructions or specific tasks to perform. An example text is provided below.

Hello! I am a Phishing Email.

This is a drill - this is only a drill!

If I were an actual phishing email, I might ask you to log into a malicious site with your actual username or password, or I might ask you to run a suspicious command like <example command>. I might try any number of tricks to get access to your Google Account or workstation.

You can learn more about recognizing phishing emails at <LINK TO RESOURCE> and even test yourself to see how good you are at spotting them. Regardless of the form a phishing email takes, you can quickly report them to the security team when you notice they’re not what they seem.

To complete the annual phishing drill, please report me. To do that, <company-specific instructions on where to report phishing>.

Thanks for doing your part to keep <company> safe!

Tricky. Phish, Ph.D

You can’t “fix” people, but you can fix the tools.

Phishing and Social Engineering aren’t going away as attack techniques. As long as humans are fallible and social creatures, attackers will have ways to manipulate the human factor. The more effective approach to both risks is a focused pursuit of secure-by-default systems in the long term, and a focus on investment in engineering defenses such as unphishable credentials (like passkeys) and implementing multi-party approval for sensitive security contexts throughout production systems. It’s because of investments in architectural defenses like these that Google hasn’t had to seriously worry about password phishing in nearly a decade.

Educating employees about alerting security teams of attacks in progress remains a valuable and essential addition to a holistic security posture. However, there’s no need to make this adversarial, and we don’t gain anything by “catching” people “failing” at the task. Let's stop engaging in the same old failed protections and follow the lead of more mature industries, such as Fire Protection, which has faced these problems before and already settled on a balanced approach.

Security Blog

Hacking for Defenders: approaches to DARPA’s AI Cyber Challenge

Staying Safe with Chrome Extensions

Time to challenge yourself in the 2024 Google CTF

On Fire Drills and Phishing Tests

Labels

Archive

Feed