Google Testing Blog: 2008

Static Methods are Death to Testability

Wednesday, December 17, 2008

by Miško Hevery

Recently many of you, after reading Guide to Testability, wrote to telling me there is nothing wrong with static methods. After all what can be easier to test than Math.abs()! And Math.abs() is static method! If abs() was on instance method, one would have to instantiate the object first, and that may prove to be a problem. (See how to think about the new operator, and class does real work)

The basic issue with static methods is they are procedural code. I have no idea how to unit-test procedural code. Unit-testing assumes that I can instantiate a piece of my application in isolation. During the instantiation I wire the dependencies with mocks/friendlies which replace the real dependencies. With procedural programing there is nothing to "wire" since there are no objects, the code and data are separate.

Here is another way of thinking about it. Unit-testing needs seams, seams is where we prevent the execution of normal code path and is how we achieve isolation of the class under test. seams work through polymorphism, we override/implement class/interface and than wire the class under test differently in order to take control of the execution flow. With static methods there is nothing to override. Yes, static methods are easy to call, but if the static method calls another static method there is no way to overrider the called method dependency.

Lets do a mental exercise. Suppose your application has nothing but static methods. (Yes, code like that is possible to write, it is called procedural programming.) Now imagine the call graph of that application. If you try to execute a leaf method, you will have no issue setting up its state, and asserting all of the corner cases. The reason is that a leaf method makes no further calls. As you move further away from the leaves and closer to the root main() method it will be harder and harder to set up the state in your test and harder to assert things. Many things will become impossible to assert. Your tests will get progressively larger. Once you reach the main() method you no longer have a unit-test (as your unit is the whole application) you now have a scenario test. Imagine that the application you are trying to test is a word processor. There is not much you can assert from the main method.

We have already covered that global state is bad and how it makes your application hard to understand. If your application has no global state than all of the input for your static method must come from its arguments. Chances are very good that you can move the method as an instance method to one of the method's arguments. (As in method(a,b) becomes a.method(b).) Once you move it you realized that that is where the method should have been to begin with. The use of static methods becomes even worse problem when the static methods start accessing the global state of the application. What about methods which take no arguments? Well, either methodX() returns a constant in which case there is nothing to test; it accesses global state, which is bad; or it is a factory.

Sometimes a static methods is a factory for other objects. This further exuberates the testing problem. In tests we rely on the fact that we can wire objects differently replacing important dependencies with mocks. Once a new operator is called we can not override the method with a sub-class. A caller of such a static factory is permanently bound to the concrete classes which the static factory method produced. In other words the damage of the static method is far beyond the static method itself. Butting object graph wiring and construction code into static method is extra bad, since object graph wiring is how we isolate things for testing.

"So leaf methods are ok to be static but other methods should not be?" I like to go a step further and simply say, static methods are not OK. The issue is that a methods starts off being a leaf and over time more and more code is added to them and they lose their positions as a leafs. It is way to easy to turn a leaf method into none-leaf method, the other way around is not so easy. Therefore a static leaf method is a slippery slope which is waiting to grow and become a problem. Static methods are procedural! In OO language stick to OO. And as far as Math.abs(-5) goes, I think Java got it wrong. I really want to write -5.abs(). Ruby got that one right.

28 comments

GTAC Videos and Slides Available

Friday, December 12, 2008

Posted by Lydia Ash, GTAC Conference Chair

The Google Test Automation Conference 2008 was a smashing success, and in no small part due to all of our presenters and participants. A wonderful thank you from all of us at Google to everyone that participated! The tone of the conference really struck me as I watched how everyone came together around various topics. I don't think we would have had trouble keeping the conversations going if there was an entire day dedicated to the moderated discussions.

We were able to get the slide decks from our presenters. These are listed below with their video link.
We will be evaluating what location the next GTAC should be held, and your comments will help shape the next conference. Building on the successes in the past, next year should be even better! Stay tuned for a Save the Date notice sometime in the spring.

Opening Remarks - Lydia Ash
Video: http://www.youtube.com/watch?v=l5QmHXcNk4g

The Future of Testing - James A. Whittaker
Video coming soon...

Advances in Automated Software Testing Technologies - Elfriede Dustin and Marcus Borch
Video: http://www.youtube.com/watch?v=HEpSdSyU03I

Boosting Your Testing Productivity with Groovy - Andres Almiray
Video: http://www.youtube.com/watch?v=UvWTfVCWKJY
Slides: http://www.slideshare.net/aalmiray/gtac-boosting-your-testing-productivity-with-groovy/

Taming the Beast: How to Test an AJAX Application - Markus Clermont and John Thomas
Video: http://www.youtube.com/watch?v=5jjrTBFZWgk
Slides: http://docs.google.com/Presentation?id=dczwht9g_62gccsc9gg

The New Genomics: Software Development at Petabyte Scale - Matt Wood
Video: http://www.youtube.com/watch?v=A64WKH9gNI8
Slides part 1: http://docs.google.com/Presentation?id=dczwht9g_3318qqfb6f5
Slides part 2: http://docs.google.com/Presentation?id=dczwht9g_393d7zg4xcm

Using Cloud Computing to Automate Full-Scale System Tests - Marc-Elian Bégin and Charles Loomis
Video: http://www.youtube.com/watch?v=atyq-41Gnjc
Slides: http://docs.google.com/Presentation?id=dczwht9g_251gcv8cbfv

Practicing Testability in the Real World - Vishal Chowdhary
Video: http://www.youtube.com/watch?v=hL829wNaF78
Slides: http://docs.google.com/Presentation?id=dczwht9g_0hgd2w5rz

Context-Driven Test Automation: How to Build the System you Really Need - Pete Schneider
Video: http://www.youtube.com/watch?v=N9sm_zcpUEw
Slides: http://docs.google.com/Presentation?id=dczwht9g_236ccxj32fd

Automated Model-Based Testing of Web Applications - Atif M. Memon and Oluwaseun Akinmade
Video: http://www.youtube.com/watch?v=6LdsIVvxISU
Slides: http://www.cs.umd.edu/~atif/GTAC08/

The Value of Small Tests - Christopher Semturs
Video: http://www.youtube.com/watch?v=MpG2i_6nkUg
Slides: http://docs.google.com/Presentation?id=dckk962d_332cxtcsmhg

JInjector: A Coverage and End-To-End Testing Framework for J2ME and RIM - Julian Harty, Olivier Gaillard, and Michele Sama
Video: http://www.youtube.com/watch?v=B2v5jQ9NLVg
Slides: http://docs.google.com/Presentation?id=dczwht9g_82d7w8bqd9

Atom Publishing Protocol: Testing your Server Implementation - David Calavera
Video: http://www.youtube.com/watch?v=uRmWTfT91uQ
Slides: http://thinkincode.net/gtac_atomPub_testing_your_server_implementation.pdf

Simple Tools to Fight the Bigger Quality Battle: Continuous Integration Using Batch Files
and Task Scheduler - Komal Joshi and Patrick Martin
Video: http://www.youtube.com/watch?v=wgP7ejMBCCU
Slides: http://docs.google.com/Presentation?id=dczwht9g_141czcvc7md

8 comments

TotT: Mockers of the (C++) World, Delight!

Friday, December 12, 2008

Sorry folks, this was a duplicate post. Please see the original here:

http://googletesting.blogspot.com/2008/12/mockers-of-c-world-delight.html

1 comment

Announcing Google C++ Mocking Framework

Thursday, December 11, 2008

Posted by Zhanyong Wan, Software Engineer

Five months ago we open-sourced Google C++ Testing Framework to help C++ developers write better tests. Enthusiastic users have embraced it and sent in numerous encouraging comments and suggestions, as well as patches to make it more useful. It was a truly gratifying experience for us.

Today, we are excited to release Google C++ Mocking Framework (Google Mock for short) under the new BSD license. When used with Google Test, it lets you easily create and use mock objects in C++ tests and rapid prototypes. If you aren't sure what mocks are or why you'll need them, our Why Google Mock? article will help explain why this is so exciting, and the Testing on the Toilet episode posted nearby on this blog gives a more light-hearted overview. In short, this technique can greatly improve the design and testability of software systems, as shown in this OOPSLA paper.

We are happily using Google Mock in more than 100 projects at Google. It works on Linux, Windows, and Mac OS X. Its benefits include:

Simple, declarative syntax for defining mocks
Rich set of matchers for validating function arguments
Intuitive syntax for controlling the behavior of a mock
Automatic verification of expectations
Easy extensibility through new user-defined matchers and actions

Our users inside Google have appreciated that Google Mock is easy and even fun to use, and is an effective tool for improving software quality. We hope you'll like it too. Interested? Please take a few minutes to read the documentation and download Google Mock. Be warned, though: mocking is addictive, so proceed at your own risk.

And... we'd love to hear from you! If you have any questions or feedback, please meet us on the Google Mock Discussion Group. Happy mocking!

1 comment

Mockers of the (C++) World, Delight!

Thursday, December 11, 2008

by Zhanyong Wan, Software Engineer

Life is unfair. You work every bit as hard as Joe the Java programmer next to you. Yet as a C++ programmer, you don't get to play with all the fancy programming tools Joe takes for granted.

In particular, without a good mocking framework, mock objects in C++ have to be rolled by hand. Boy, is that tedious! (Not to mention how error-prone it is.) Why should you endure this?

Dread no more. Google Mock is finally here to help! It's a Google-originated open-source framework for creating and using C++ mocks. Inspired by jMock and EasyMock, Google Mock is easy to use, yet flexible and extensible. All you need to get started is the ability to count from 0 to 10 and use an editor.

Think you can do it? Let's try this simple example: you have a ShoppingCart class that gets the tax rate from a server, and you want to test that it remembers to disconnect from the server even when the server has generated an error. It's easy to write the test using a mock tax server, which implements this interface:

class TaxServer {
// Returns the tax rate of a location
// (by postal code) or -1 on error.
virtual double FetchTaxRate(
const string& postal_code) = 0;
virtual void CloseConnection() = 0;
};

Here's how you mock it and use the mock server to verify the expected behavior of ShoppingCart:

class MockTaxServer : public TaxServer {     // #1
  MOCK_METHOD1(FetchTaxRate, double(const string&));
  MOCK_METHOD0(CloseConnection, void());
};

TEST(ShoppingCartTest,
    StillCallsCloseIfServerErrorOccurs) {
MockTaxServer mock_taxserver;              // #2
EXPECT_CALL(mock_taxserver, FetchTaxRate(_))
    .WillOnce(Return(-1));                   // #3
EXPECT_CALL(mock_taxserver, CloseConnection());
ShoppingCart cart(&mock_taxserver);        // #4
cart.CalculateTax(); // Calls FetchTaxRate()
   // and CloseConnection().
}                                            // #5

Derive the mock class from the interface. For each virtual method, count how many arguments it has, name the result n, and define it using MOCK_METHODn, whose arguments are the name and type of the method.
Create an instance of the mock class. It will be used where you would normally use a real object.
Set expectations on the mock object (How will it be used? What will it do?). For example, the first EXPECT_CALL says that FetchTaxRate() will be called and will return an error. The underscore (_) is a matcher that says the argument can be anything. Google Mock has many matchers you can use to precisely specify what the argument should be like. You can also define your own matcher or use an exact value.
Exercise code that uses the mock object. You'll get an error immediately if a mock method is called more times than expected or with the wrong arguments.
When the mock object is destroyed, it checks that all expectations on it have been satisfied.

You can also use Google Mock for rapid prototyping – and get a better design. To find out more, visit the project homepage at http://code.google.com/p/googlemock/. Now, be the first one on your block to use Google Mock and prepare to be envied. Did I say life is unfair?

Remember to download this episode and post it in your office!
Toilet-Friendly Version

2 comments

Clean Code Talks - Inheritance, Polymorphism, & Testing

Monday, December 08, 2008

by Miško Hevery

Google Tech Talks
November 20, 2008

ABSTRACT

Is your code full of if statements? Switch statements? Do you have the same switch statement in various places? When you make changes do you find yourself making the same change to the same if/switch in several places? Did you ever forget one?

This talk will discuss approaches to using Object Oriented techniques to remove many of those conditionals. The result is cleaner, tighter, better designed code that's easier to test, understand and maintain.

Video

Slides

4 comments

Guide to Writing Testable Code

Wednesday, November 26, 2008

It is with great pleasure that I have been able to finally open-source the Guide to Writing Testable Code.

I am including the first page here for you, but do come and check it out in detail.

To keep our code at Google in the best possible shape we provided our software engineers with these constant reminders. Now, we are happy to share them with the world.

Many thanks to these folks for inspiration and hours of hard work getting this guide done:

Flaw #1: Constructor does Real Work

Warning Signs

new keyword in a constructor or at field declaration
Static method calls in a constructor or at field declaration
Anything more than field assignment in constructors
Object not fully initialized after the constructor finishes (watch out for initialize methods)
Control flow (conditional or looping logic) in a constructor
Code does complex object graph construction inside a constructor rather than using a factory or builder
Adding or using an initialization block

Flaw #2: Digging into Collaborators

Warning Signs

Objects are passed in but never used directly (only used to get access to other objects)
Law of Demeter violation: method call chain walks an object graph with more than one dot (.)
Suspicious names: context, environment, principal, container, or manager

Flaw #3: Brittle Global State & Singletons

Warning Signs

Adding or using singletons
Adding or using static fields or static methods
Adding or using static initialization blocks
Adding or using registries
Adding or using service locators

Flaw #4: Class Does Too Much

Warning Signs

Summing up what the class does includes the word “and”
Class would be challenging for new team members to read and quickly “get it”
Class has fields that are only used in some methods
Class has static methods that only operate on parameters

5 comments

Online Machine Learning Testing == Extreme Testing

Monday, November 24, 2008

Posted by Alek Icev

As you may know our core vision is to build "The perfect search engine that would understand exactly what you mean and give back exactly what you want.". In order to do that we learn from our data, we learn from the past and we love Machine Learning. Everyday we are trying to answer the following questions.

Is this email spam?
Is this search result relevant?
What product category does that query belong to?
What is the ad that users are most likely to click on for the query “flowers”?
Is this click fraudulent?
Is this ad likely to result in a purchase (not merely a click)?
Is this image pornographic?
Does this page contain malware? –Should this query bring up a maps onebox?

Solving many problems require Machine Learning techniques. On all of them we can build prediction models that will learn from the past and try to give the most precise answers to our users. We use variety of Machine Learning algorithms at Google and we are experimenting with numerous old and new advancements in this field in order to find the most accurate, fast and reliable solution for the different problems that we are attacking. Of course one the biggest challenges that we are facing in the Test Engineering community is how are we going to test these algorithms. The amount of the data that Google generates goes beyond all of the known boundaries of environments where the current Machine Learning Solutions were being crafted and tested. We want to open discussion around the ideas how to test different online machine algorithms. From time to time we will present an algorithm and some ideas how to test it and solicit the feedback from the wider audience i.e. try to build a wisdom of the crowds over the testing ideas.

So let's look at the Stochastic Gradient Descent Algorithm

Where X is the set of input values of X_i ,W is set of the importance factors(weights) of every value X_i. A positive weight means that that risk factor increases the probability of the outcome, while a negative weight means that that risk factor decreases the probability of that outcome. t is the target output value, η is the learning rate(the role of the learning rate is to control the level to which the weights are modified at every iteration and f(z) is the output generated by the function that maps large input domain to a small set of output values in this case. The function f(z) in this case is the logistic function:

$f(z)=\frac{1}{1+e^{-z}}$

z = x0w0 + x1w1 + x2w2 + ... + xkwk

The logistic function has nice characteristics since it can take any input, and basically squash it to 0 or 1. Ideal for predicting probabilities on events that are dependent on multiple factors(X_i) each with different importance weights(W_i).The Stochastic Gradient Descent provides fast convergence to find the optimal minimums of the error(E) that the function is making on the prediction as well as if there are multiple local minimums the algorithms guarantees converging to the global minimum of the prediction error. So let’s go back now into the real online world where we want to give answers (predictions) to our users in milliseconds and ask the question how are we going to design automated tests for the Stochastic Gradient Descent Algorithm embedded into a live online prediction system. The environment is pretty agile and dynamic, the code is being changed every hour, you want your tests to run on 24/7 basis, you want to detect errors upstream in the development process, but you don’t want to block the development process with tests that are running days, on the other side you want to release new features fast, but the release process has to be error prone(imagine the world with google being down for 5 mins, that is a global catastrophe, isn’t it?!

So let’s look at some of the test strategies:

Should we try to train the model(set of the importance factors) and test the model with the subset of the training data? What if this takes far more than hours, maybe days to do that? Should we try to reduce the set of importance factors (X_i) and get the convergence(E->0) on the reduced model?

Should we try to reduce the training data set(the variety of set of values for X as an input to the algorithm) and keep the original model and get the convergence by any price? Should we be happy with reducing both the model size and the training set? Are we going to worry for over-fitting in the test environment? Given the original data is online data and evolves fast, are we going to be satisfied with fixed data test set or change the input test data frequently? What are the triggers that will make you do so? What else should we do?

Drop us a note, all ideas are more than welcome.

7 comments

Clean Code Talks - Global State and Singletons

Friday, November 21, 2008

by Miško Hevery

Google Tech Talks
November 13, 2008

ABSTRACT

Clean Code Talk Series
Topic: Global State and Singletons

Speaker: Miško Hevery

Video

Slides

No comments

My Unified Theory of Bugs

Tuesday, November 18, 2008

by Miško Hevery

I think of bugs as being classified into three fundamental kinds of bugs.

Logical: Logical bug is the most common and classical "bug." This is your "if"s, "loop"s, and other logic in your code. It is by far the most common kind of bug in an application. (Think: it does the wrong thing)

Wiring: Wiring bug is when two different objects are miswired. For example wiring the first-name to the last-name field. It could also mean that the output of one object is not what the input of the next object expects. (Think: Data gets clobbered in process to where it is needed.)

Rendering: Rendering bug is when the output (typical some UI or a report) does not look right. The key here is that it takes a human to determine what "right" is. (Think: it "looks" wrong)

NOTE: A word of caution. Some developers think that since they are building UI everything is a rendering bug! A rendering bug would be that the button text overlaps with the button border. If you click the button and the wrong thing happens than it is either because you wired it wrong (wiring problem) or your logic is wrong (a logical bug). Rendering bugs are rare.

Typical Application Distribution (without Testability in Mind)

The first thing to notice about these three bug types is that the probability is not evenly distributed. Not only is the probability not even, but the cost of finding and fixing them is different. (I am sure you know this from experience). My experience from building web-apps tells me that the Logical bugs are by far the most common, followed by wiring and finally rendering bugs.

Cost of Finding the Bug

Logical bugs are notoriously hard to find. This is because they only show up when the right set of input conditions are present and finding that magical set of inputs or reproducing it tends to be hard. On the other hand wiring bugs are much easier to spot since the wiring of the application is mostly fixed. So if you made a wiring error, it will show up every time you execute that code, for the most part independent of input conditions. Finally, the rendering bugs are the easiest. You simply look at the page and quickly spot that something "looks" off.

Cost of Fixing the Bug

Our experience also tells us how hard it is to fix things. A logical bug is hard to fix, since you need to understand all of the code paths before you know what is wrong and can create a solution. Once the solution is created, it is really hard to be sure that we did not break the existing functionality. Wiring problems are much simpler, since they either manifest themselves with an exception or data in wrong location. Finally rendering bugs are easy since you "look" at the page and immediately know what went wrong and how to fix it. The reason it is easy to fix is that we design our application knowing that rendering will be something which will be constantly changing.

	Logical	Wiring	Rendering
Probability of Occurrence	High	Medium	Low
Difficulty of Discovering	Difficult	Easy	Trivial
Cost of Fixing	High Cost	Medium	Low

How does testability change the distribution?

It turns out that testable code has effect on the distribution of the bugs. Testable code needs:

Clear separation between classes (Testable Seams) --> clear separation between classes makes it less likely that a wiring problem is introduced. Also, less code per class lowers the probability of logical bug.

Dependency Injection --> makes wiring explicit (unlike singletons, globals or service locators).

Clear separation of Logic from Wiring --> by having wiring in a single place it is easier to verify.

The result of all of this is that the number of wiring bugs are significantly reduced. (So as a percentage we gain Logical Bugs. However total number of bugs is decreased.)

The interesting thing to notice is that you can get benefit from testable code without writing any tests. Testable code is better code! (When I hear people say that they sacrificed "good" code for testability, I know that they don't really understand testable-code.)

We Like Writing Unit-Tests

Unit-tests give you greatest bang for the buck. A unit test focuses on the most common bugs, hardest to track down and hardest to fix. And a unit-test forces you to write testable code which indirectly helps with wiring bugs. As a result when writing automated tests for your application we want to overwhelmingly focus on unit test. Unit-tests are tests which focus on the logic and focus on one class/method at a time.

Unit-tests focus on the logical bugs. Unit tests focus on your "if"s and "loop"s, a Focused unit-test does not directly check the wiring. (and certainly not rendering)

Unit-test are focused on a single CUT (class-under-test). This is important, since you want to make sure that unit-tests will not get in the way of future refactoring. Unit-tests should HELP refactoring not PREVENT refactorings. (Again, when I hear people say that tests prevent refactorings, I know that they have not understood what unit-tests are)

Unit-tests do not directly prove that wiring is OK. They do so only indirectly by forcing you to write more testable code.

Functional tests verify wiring, however there is a trade-off. You "may" have hard time refactoring if you have too many functional test OR, if you mix functional and logical tests.

Managing Your Bugs

I like to think of tests as bug management. (with the goal of bug free) Not all types of errors are equally likley, therefore I pick my battles of which tests I focus on. I find that I love unit-tests. But they need to be focused! Once a test starts testing a lot of classes in a single pass I may enjoy high coverage, but it is really hard to figure out what is going on when the test is red. It also may hinder refactorings. I tend to go very easy on Functional tests. A single test to prove that things are wired together is good enough to me.

I find that a lot of people claim that they write unit-tests, but upon closer inspection it is a mix of functional (wiring) and unit (logic) test. This happens becuase people wirte tests after code, and therefore the code is not testable. Hard to test code tends to create mockeries. (A mockery is a test which has lots of mocks, and mocks returning other mocks in order to execute the desired code) The result of a mockery is that you prove little. Your test is too high level to assert anything of interest on method level. These tests are too intimate with implementation ( the intimace comes from too many mocked interactions) making any refactorings very painful.

7 comments

TotT: Finding Data Races in C++

Thursday, November 13, 2008

If you've got some multi-threaded code, you may have data races in it. Data races are hard to find and reproduce – usually they will not occur in testing but will fire once a month in production.

For example, you ask each of your two interns to bring you a bottle of beer. This will usually result in your getting two bottles (perhaps empty), but in a rare situation that the interns collide near the fridge, you may get fewer bottles.

4 int bottles_of_beer = 0;
5 void Intern1() { bottles_of_beer++; } // Intern1 forgot to use Mutex.
6 void Intern2() { bottles_of_beer++; } // Intern2 copied from Intern1.
7 int main() {
8 // Folks, bring me one bottle of beer each, please.
9 ClosureThread intern1(NewPermanentCallback(Intern1)),
10 intern2(NewPermanentCallback(Intern2));
11 intern1.SetJoinable(true); intern2.SetJoinable(true);
12 intern1.Start(); intern2.Start();
13 intern1.Join(); intern2.Join();
14 CHECK_EQ(2, bottles_of_beer) << "Who didn't bring me my beer!?";
15 }

Want to find data races in your code? Run your program under Helgrind!

$ helgrind path/to/your/program
Possible data race during read of size 4 at 0x5429C8
at 0x400523: Intern2() tott.cc:6
  by 0x400913: _FunctionResultCallback_0_0::Run() ...
  by 0x4026BB: ClosureThread::Run() ...
  ...
  Location 0x5429C8 has never been protected by any lock
  Location 0x5429C8 is 0 bytes inside global var "bottles_of_beer"
  declared at tott.cc:4

Helgrind will also detect deadlocks for you.

Helgrind is a tool based on Valgrind. Valgrind is a binary translation framework which has other useful tools such as a memory debugger and a cache simulator. Related TotT episodes will follow.

No beer was wasted in the making of this TotT.

Remember to download this episode of Testing on the Toilet and post it in your office.

2 comments

Clean Code Talks - Dependency Injection

Tuesday, November 11, 2008

by Miško Hevery

Google Tech Talks
November 6, 2008

ABSTRACT

Clean Code Talk Series
Topic: Don't Look For Things!

Speaker: Miško Hevery

Video

Slides

3 comments

Clean Code Talks - Unit Testing

Wednesday, November 05, 2008

by Miško Hevery

Google Tech Talks October, 30 2008 ABSTRACT Clean Code Talks - Unit Testing Speaker: Misko Hevery

Video

Slides

12 comments

Partial Automation: Keeping humans in the loop

Tuesday, November 04, 2008

Posted by Patricia Legaspi, Test Engineering Manager

One of the challenges of automation is achieving complete automation. Ideally, complete or total automation would not require any human intervention or verification yet this is a difficult level to achieve. Investing Engineering time to completely automate tests is expensive and, many times, has diminishing returns. Rather than trying to achieve complete automation, investing in ways to make the most out of the automated test and the human time is time better spent.

Effective test report
Consider an automated UI test... Routinely, automated UI tests result in false negatives due to timing issues, the complexity of the steps taken, and other factors. These have to be investigated which can be time consuming. A thorough report can be created for automated UI tests to present log information, error reports, screen shots of the state of the application when the test failed, and an easy way to re-run the test in question. A human can make use of the information provided and effectively investigate the possible issue. If we can reduce the amount of work that someone has to do to investigate a test failure, the UI tests become more valuable and the human plays a much more important role as a "verifier." Had the report not provided any information for the failed test, the human would have spent, in some cases, hours investigating the issue rather than continuing to automate or run exploratory tests which would be of more value. Is there a way to maximize what people do well, while also maximizing the use of automation?

Applying human intelligence
What about tests that require a human eye? There are many arguments about tests that require a human make a judgment about the appearance of rendered UI objects. Machines are great at running tests that return in a firm pass or fail but for tests that require opinion or judgment, the human has an advantage. We've been experimenting with image comparison tests. Screen shots of a golden version of the application under test are compared to screen shots of the current release candidate to verify that the application continues to render properly and that the UI "looks good". Although image comparison techniques can determine if the screen shots are different, they cannot help determine if the difference is "important". The human comes back into the picture to complement the automated test. To make effective use of a person's time, the test results should be organized and well presented. For image comparison tests, a report which shows the golden and release candidate screen shots side-by-side with clearly highlighted differences or allows you to replace the golden screen shot is key. A tester can quickly navigate the reported differences and determine if they are actual failures or acceptable differences. Frequently, an application's UI will change which will result in expected image differences. If the differences are expected changes, the tester should be able to replace the golden with the new image with the click of a button for future comparison.

Aside from test automation, efficiency can also be improved by streamlining the various components of the process. This includes test environment setup, test data retrieval, and test execution. Release cycles tighten and so should the testing processes. Automating the environment setup can be a major time saver and allow for added time to run in depth tests. Creating continuous builds to run automated tests ahead of time and preparing environments overnight results in added testing time.

Automated tests are rarely bullet proof, they are prone to errors and false failures. In some cases, creating an infrastructure to make test analysis easy for humans, is much more effective than trying to engineer tests to automatically "understand" the UI. We coined this, "Partial Automation." We found that by having a person in the loop dramatically reduces the time spent trying to over engineer the automated tests. Obviously, there are trade-offs to this approach and one size doesn't fit all. We believe that automation does not always need to mean complete automation; we automate as much as you can and consider how human judgment can be used efficiently.

6 comments

TotT: Contain Your Environment

Thursday, October 30, 2008

Many modules must access elements of their environment that are too heavyweight for use in tests, for example, the file system or network. To keep tests lightweight, we mock out these elements. But what if no mockable interface is available, or the existing interfaces pull in extraneous dependencies? In such cases we can introduce a mediator interface that's directly associated with your module (usually as a public inner class). We call this mediator an "Env" (for environment); this name helps readers of your class recognize the purpose of this interface.

For example, consider a class that cleans the file system underlying a storage system:

// Deletes files that are no longer reachable via our storage system's
// metadata.
class FileCleaner {
public:
  class Env {
  public:
    virtual bool MatchFiles(const char* pattern, vector* filenames) = 0;
    virtual bool BulkDelete(const vector& filenames) = 0;
    virtual MetadataReader* NewMetadataReader() = 0;
    virtual ~Env();
  };
  // Constructs a FileCleaner. Uses “env” to access files and metadata.
  FileCleaner(Env* env, QuotaManager* qm);
  // Deletes files that are not reachable via metadata.
  // Returns true on success.
  bool CleanOnce();
};

FileCleaner::Env lets us test FileCleaner without accessing the real file system or metadata. It also makes it easy to simulate various kinds of failures, for example, of the file system:

class NoFileSystemEnv : public FileCleaner::Env {
  virtual bool MatchFiles(const char* pattern, vector* filenames) {
  match_files_called_ = true;
  return false;
  }
  ...
};

TEST(FileCleanerTest, FileCleaningFailsWhenFileSystemFails) {
  NoFileSystemEnv* env = new NoFileSystemEnv();
  FileCleaner cleaner(env, new MockQuotaManager());
  ASSERT_FALSE(cleaner.CleanOnce());
  ASSERT_TRUE(env->match_files_called_);
}

An Env object is particularly useful for restricting access to other modules or systems, for example, when those modules have overly-wide interfaces. This has the additional benefit of reducing your class's dependencies. However, be careful to keep the “real” Env implementation simple, lest you introduce hard-to-find bugs in the Env. The methods of your “real” Env implementation should just delegate to other, well-tested methods.

The most important benefits of an Env are that it documents how your class accesses its environment and it encourages future modifications to your module to keep tests small by extending and mocking out the Env.

Remember to download this episode of Testing on the Toilet and post it in your office.

No comments

GUI Testing: Don't Sleep Without Synchronization

Tuesday, October 28, 2008

Posted by Philip Zembrod, Software Engineer in Test, Sweden

So you're working on TheFinalApp - the ultimate end-user application, with lots of good features and a really neat GUI. You have a team that's keen on testing and a level of unit test coverage that others only dream of. The star of the show is your suite of automatic GUI end-to-end tests — your team doesn't have to manually test every release candidate.

Life would be good if only the GUI tests weren't so flaky. Every once and again, your test case clicks a menu item too early, while the menu is still opening. Or it double-clicks to open a tree node, tries to verify the open too early, then retries, which closes the node (oops). You have tried adding sleep statements, which has helped somewhat, but has also slowed down your tests.

Why all this pain? Because GUIs are not designed to synchronize with other computer programs. They are designed to synchronize with human beings, which are not like computers:

Humans act much more slowly. Well-honed GUI test robots drive GUIs at near theoretical maximum speed.

Humans are much better at observing the GUI, and they react intelligently to what they see.

Humans extract more meaningful information from a GUI.

In contrast to testing a server, where you usually find enough methods or messages in the server API to synchronize the testing with the server, a GUI application usually lacks these means of synchronization. As a result, a running automated GUI test often consists of one long sequence of race conditions between the automated test and the application under test.

GUI test synchronization boils down to the question: Is the app under test finished with what it's doing? "What it's doing" may be small, like displaying a combo box, or big, like a business transaction. Whatever "it" is, the test must be able to tell whether "it" is finished. Maybe you want to test something while "it" is underway, like verify that the browser icon is rotating while a page is loading. Maybe you want to deliberately click the "Submit" button again in the middle of a transaction to verify that nothing bad happens. But usually, you want to wait until "it" is done.

How to find out whether "it" is done? Ask! Let your test case ask your GUI app. In other words: provide one or several test hooks suitable for your synchronization needs.

The questions to ask depend on the type, platform, and architecture of your application. Here are three questions that worked for me when dealing with a single-threaded Win32 MFC database app:

The first is a question for the OS. The Win32 API provides a function to wait while a process has pending input events:
DWORD WaitForInputIdle(HANDLE hProcess, DWORD dwMilliseconds). Choosing the shortest possible timeout (dwMilliseconds = 1) effectively turns this from a wait-for to a check-if function, so you can explicitly control the waiting loop; for example, to combine several different check functions. Reasoning: If the GUI app has pending input, it's surely not ready for new input.

The second question is: Is the GUI app's message queue empty? I did this with a test hook, in this case a WM_USER message; it could perhaps also be done by calling PeekMessage() in the GUI app's process context via CreateRemoteThread(). Reasoning: If the GUI app still has messages in its queue, it's not yet ready for new input.

The third is more like sending a probe than a question, but again using a test hook. The test framework resets a certain flag in the GUI app (synchronously) and then (asynchronously) posts a WM_USER message into the app's message queue that, upon being processed, sets this flag. Now the test framework checks periodically (and synchronously again) to see whether the flag has been set. Once it has, you know the posted message has been processed. Reasoning: When the posted message (the probe) has been processed, then surely messages and events sent earlier to the GUI app have been processed. Of course, for multi-threaded applications this might be more complex.

These three synchronization techniques resulted in fast and stable test execution, without any test flakiness due to timing issues. All without sleeps, except in the synchronization loop.

Applying this idea to different platforms requires finding the right questions to ask and the right way to ask them. I'd be interested to hear if someone has done something similar, e.g. for an Ajax application. A query into the server to check if any XML responses are pending, perhaps?

10 comments

Testability Explorer: Measuring Testability

Monday, October 27, 2008

Testability Explorer: Using Byte-Code Analysis to Engineer Lasting Social Changes in an Organization’s Software Development Process. (Or How to Get Developers to Write Testable Code)

Presented at 2008 OOPSLA by Miško Hevery a Best Practices Coach @ Google

Abstract

Testability Explorer is an open-source tool that identifies hard-to-test Java code. Testability Explorer provides a repeatable objective metric of “testability.” This metric becomes a key component of engineering a social change within an organization of developers. The Testability Explorer report provides actionable information to developers which can be used as (1) measure of progress towards a goal and (2) a guide to refactoring towards a more testable code-base.
Keywords: unit-testing; testability; refactoring; byte-code analysis; social engineering.

1. Testability Explorer Overview

In order to unit-test a class, it is important that the class can be instantiated in isolation as part of a unit-test. The most common pitfalls of testing are (1) mixing object-graph instantiation with application-logic and (2) relying on global state. The Testability Explorer can point out both of these pitfalls.

1.1 Non-Mockable Cyclomatic Complexity

Cyclomatic complexity is a count of all possible paths through code-base. For example: a main method will have a large cyclomatic complexity since it is a sum of all of the conditionals in the application. To limit the size of the cyclomatic complexity in a test, a common practice is to replace the collaborators of class-under-test with mocks, stubs, or other test doubles.

Let’s define “non-mockable cyclomatic complexity” as what is left when the class-under-test has all of its accessible collaborators replaced with mocks. A code-base where the responsibility of object-creation and application-logic is separated (using Dependency Injection) will have high degree of accessible collaborators; as a result most of its collaborators will easily be replaceable with mocks, leaving only the cyclomatic complexity of the class-under-test behind.

In applications, where the class-under-test is responsible for instantiating its own collaborators, these collaborators will not be accessible to the test and as a result will not be replaceable for mocks. (There is no place to inject test doubles.) In such classes the cyclomatic complexity will be the sum of the class-under-test and its non-mockable collaborators.

The higher the non-mockable cyclomatic complexity the harder it will be to write a unit-test. Each non-mockable conditional translates to a single unit of cost on the Testability Explorer report. The cost of static class initialization and class construction is automatically included for each method, since a class needs to be instantiated before it can be exercised in a test.

1.2 Transitive Global-State

Good unit-tests can be run in parallel and in any order. To achieve this, the tests need to be well isolated. This implies that only the stimulus from the test has an effect on the code execution, in other words, there is no global-state.

Global-state has a transitive property. If a global variable refers to a class than all of the references of that class (and all of its references) are globally accessible as well. Each globally accessible variable, that is not final, results in a cost of ten units on the Testability Explorer.

2. Testability Explorer Report

A chain is only as strong as its weakest link. Therefore the cost of testing a class is equal to the cost of the class’ costliest method. In the same spirit the application’s overall testability is de-fined in terms of a few un-testable classes rather than a large number of testable ones. For this reason when computing the overall score of a project the un-testable classes are weighted heavier than the testable ones.

3. How to Interpret the Report

By default the classes are categorized into three categories: “Excellent” (green) for classes whose cost is below 50; “Good” (yellow) for classes whose cost is below 100; and “Needs work” (red) for all other classes. For convenience the data is presented as both a pie chart and histogram distribution and overall (weighted average) cost shown on a dial.

[-]ClassRepository [ 323 ]
 [-]ClassInfo getClass(String) [ 323 ]
   line 51:
     ClassInfo parseClass(InputStream) [318]
     InputStream inputStreamForClass(String) [2]
 [-]ClassInfo parseClass(InputStream) [318]
   line 77: void accept(ClassVisitor, int) [302]
   line 75: ClassReader(InputStream) [15]

Clicking on the class ClassRepository allows one to drill down into the classes to get more information. For example the above report shows that ClassRepository has a high cost of 318 due to the parseClass(InputStream) method. Looking in closer we see that the cost comes from line 77 and an invocation of the accept() method.

73:ClassInfo parseClass(InputStream is) {
74:  try {
75:    ClassReader reader = new ClassReader(is);
76:    ClassBuilder v = new ClassBuilder (this);
77:    reader.accept(v, 0);
78:    return visitor.getClassInfo();
79:  } catch (IOException e) {
80:    throw new RuntimeException(e);
81:  }
82:}

As you can see from the code the ClassReader can never be replaced for a mock and as a result the cyclomatic complexity of the accept method can not be avoided in a test — resulting in a high testability cost. (Injecting the ClassReader would solve this problem and make the class more test-able.)

4. Social Engineering

In order to produce a lasting change in the behavior of developers it helps to have a measurable number to answer where the project is and where it should be. Such information can provide in-sight into whether or not the project is getting closer or farther from its testability goal.

People respond to what is measured. Integrating the Testability Explorer with the project’s continuous build and publishing the report together with build artifacts communicate the importance of testability to the team. Publishing a graph of overall score over time allows the team to see changes on per check-in basis.

If Testability Explorer is used to identify the areas of code that need to be refactored, than compute the rate of improvement and project expected date of refactoring completion and create a sense of competition among the team members.

It is even possible to set up a unit test for Testability Explorer that will only allow the class whose testability cost is better than some predetermined cost.

5 comments

Dependency Injection Myth: Reference Passing

Wednesday, October 22, 2008

by Miško Hevery

After reading the article on Singletons (the design anti-pattern) and how they are really global variables and dependency injection suggestion to simply pass in the reference to the singleton in a constructor (instead of looking them up in global state), many people incorrectly concluded that now they will have to pass the singleton all over the place. Let me demonstrate the myth with the following example.

Let's say that you have a LoginPage which uses the UserRepository for authentication. The UserRepository in turn uses Database Singleton to get a hold of the global reference to the database connection, like this:

class UserRepository {
 private static final BY_USERNAME_SQL = "Select ...";

 User loadUser(String user) {
   Database db = Database.getInstance();
   return db.query(BY_USERNAME_SQL, user);
 }
}

class LoginPage {
 UserRepository repo = new UserRepository();

 login(String user, String pwd) {
   User user = repo.loadUser(user);
   if (user == null || user.checkPassword(pwd)) {
     throw new IllegalLoginException();
   }
 }
}

The first thought is that if you follow the advice of dependency injection you will have to pass in the Database into the LoginPage just so you can pass it to the UserRepository. The argument goes that this kind of coding will make the code hard to maintain, and understand. Let's see what it would look like after we get rid of the global variable Singleton Database look up.

First, lets have a look at the UserRepository.

class UserRepository {
 private static final BY_USERNAME_SQL = "Select ...";
 private final Database db;

  UserRepository(Database db) {
   this.db = db;
 }

 User loadUser(String user) {
   return db.query(BY_USERNAME_SQL, user);
 }
}

Notice how the removal of Singleton global look up has cleaned up the code. This code is now easy to test since in a test we can instantiate a new UserRepository and pass in a fake database connection into the constructor. This improves testability. Before, we had no way to intercept the calls to the Database and hence could never test against a Database fake. Not only did we have no way of intercepting the calls to Database, we did not even know by looking at the API that Database is involved. (see Singletons are Pathological Lairs) I hope everyone would agree that this change of explicitly passing in a Database reference greatly improves the code.

Now lets look what happens to the LoginPage...

class LoginPage {
 UserRepository repo;

  LoginPage(Database db) {
   repo = new UserRepository(db);
 }

 login(String user, String pwd) {
   User user = repo.loadUser(user);
   if (user == null || user.checkPassword(pwd)) {
     throw new IllegalLoginException();
   }
 }
}

Since UserRepository can no longer do a global look-up to get a hold of the Database it musk ask for it in the constructor. Since LoginPage is doing the construction it now needs to ask for the Databse so that it can pass it to the constructor of the UserRepository. The myth we are describing here says that this makes code hard to understand and maintain. Guess what?! The myth is correct! The code as it stands now is hard to maintain and understand. Does that mean that dependency injection is wrong? NO! it means that you only did half of the work! In how to think about the new operator we go into the details why it is important to separate your business logic from the new operators. Notice how the LoginPage violates this. It calls a new on UserRepository. The issue here is that LoginPage is breaking a the Law of Demeter. LoginPage is asking for the Database even though it itself has no need for the Database (This greatly hinders testability as explained here). You can tell since LoginPage does not invoke any method on the Database. This code, like the myth suggest, is bad! So how do we fix that?

We fix it by doing more Dependency Injection.

class LoginPage {
 UserRepository repo;

  LoginPage(UserRepository repo) {
   this.repo = repo;
 }

 login(String user, String pwd) {
   User user = repo.loadUser(user);
   if (user == null || user.checkPassword(pwd)) {
     throw new IllegalLoginException();
   }
 }
}

LoginPage needs UserRepository. So instead of trying to construct the UserRepository itself, it should simply ask for the UserRepository in the constructor. The fact that UserRepository needs a reference to Database is not a concern of the LoginPage. Neither is it a concern of LoginPage how to construct a UserRepository. Notice how this LoginPage is now cleaner and easier to test. To test we can simply instantiate a LoginPage and pass in a fake UserRepository with which we can emulate what happens on successful login as well as on unsuccessful login and or exceptions. It also nicely takes care of the concern of this myth. Notice that every object simply knows about the objects it directly interacts with. There is no passing of objects reference just to get them into the right location where they are needed. If you get yourself into this myth then all it means is that you have not fully applied dependency injection.

So here are the two rules of Dependency Injection:

Always ask for a reference! (don't create, or look-up a reference in global space aka Singleton design anti-pattern)
If you are asking for something which you are not directly using, than you are violating a Law of Demeter. Which really means that you are either trying to create the object yourself or the parameter should be passed in through the constructor instead of through a method call. (We can go more into this in another blog post)

So where have all the new operators gone you ask? Well we have already answered that question here. And with that I hope we have put the myth to rest!

BTW, for those of you which are wondering why this is a common misconception the reason is that people incorrectly assume that the constructor dependency graph and the call graph are inherently identical (see this post). If you construct your objects in-line (as most developers do, see thinking about the new operator) then yes the two graphs are very similar. However, if you separate the object graph instantiation from the its execution, than the two graphs are independent. This independence is what allows us to inject the dependencies directly where they are needed without passing the reference through the intermediary collaborators.

4 comments

Test Engineering at Google

Thursday, October 16, 2008

By Roshan Sembacuttiaratchy, Software Engineer in Test, Google Zurich, Switzerland

When I tell people that I'm a Test Engineer at Google, I get a confused look from them and questions as to what I actually do. Do I sit in front of a keyboard clicking every button and link on screen? Do I watch over the infinite number of monkeys, checking to make sure they produce Shakespeare's work while clicking said links? Or do we create better and smarter pigeons? Well, it's actually a bit of everything, but not quite in the way you might think at first.

The people working in Test Engineering are software developers with a passion for testing and test automation. Whereas most software engineers at Google are working on developing products, Test Engineering's objective is to help improve the quality of the projects we're involved in. We work integrated with the project team, developing test frameworks and setting up test systems. One of our key mantras is automation, so our first task whenever we're assigned to a new project is to evaluate the current state of test automation, identify what we could do to improve the results obtained, reduce the total run time, and make the best use of available resources. Load and performance testing is another important task we perform, along with analysis of the results. We're also not afraid to get our hands dirty by diving deep into the project's code to identify what re-factoring might be useful to help test the system better, creating and introducing stubs, fakes and mocks as necessary. As you might guess, we're big fans of Test Driven Design, Continuous Integration and other agile techniques and play the role of evangelists, spreading the word on new tools, techniques and best practices.

Individual project assignments tend to be limited in duration, as we're a relatively small group of people in Test Engineering, helping support a much larger group of software developers. So once we've completed our tasks for one project, we move on to the next, working with a different team of people and a different set of problems, changing projects every few months.

That pretty much describes what I do, and how our team in Zurich works. So to answer the questions posed initially, I write software which performs the task of clicking every button and every link, and which then verifies the result of these actions. I run this on a few hundred machines to mimic virtual monkeys, and hand it over to the development team so that they can use it to check that their product is working, even after I've left their project. And if it produces the works of Shakespeare, that's just an added bonus! :-)

7 comments

TotT: Floating-Point Comparison

Thursday, October 16, 2008

If your code manipulates floating-point values, your tests will probably involve floating-point values as well.

When comparing floating-point values, checking for equality might lead to unexpected results. Rounding errors can lead to a result that is close to the expected one, but not equal. As a consequence, an assertion might fail when checking for equality of two floating-point quantities even if the program is implemented correctly.

The Google C++ Testing Framework provides functions for comparing two floating-point quantities up to a given precision.

In C++, you can use the following macros:

ASSERT_FLOAT_EQ(expected, actual);
ASSERT_DOUBLE_EQ(expected, actual);
EXPECT_FLOAT_EQ(expected, actual);
EXPECT_DOUBLE_EQ(expected, actual);

In Java, JUnit overloads Assert.assertEquals for floating-point types:

assertEquals(float expected, float actual, float delta);
assertEquals(double expected, double actual, double delta);

An example (in C++):

TEST(SquareRootTest, CorrectResultForPositiveNumbers) {
  EXPECT_FLOAT_EQ(2.0f, FloatSquareRoot(4.0f));
  EXPECT_FLOAT_EQ(23.3333f, FloatSquareRoot(544.44444f));
  EXPECT_DOUBLE_EQ(2.0, DoubleSquareRoot(4.0));
  EXPECT_DOUBLE_EQ(23.33333333333333, DoubleSquareRoot(544.44444444444444));
    // the above succeeds
  EXPECT_EQ(2.0, DoubleSquareRoot(4.0));
    // the above fails
  EXPECT_EQ(23.33333333333333, DoubleSquareRoot(544.44444444444444));
}

Remember to download this episode of Testing on the Toilet and post it in your office.

1 comment

Testing Blog

Static Methods are Death to Testability

GTAC Videos and Slides Available

TotT: Mockers of the (C++) World, Delight!

Announcing Google C++ Mocking Framework

Mockers of the (C++) World, Delight!

Clean Code Talks - Inheritance, Polymorphism, & Testing

Guide to Writing Testable Code

Online Machine Learning Testing == Extreme Testing

Clean Code Talks - Global State and Singletons

My Unified Theory of Bugs

TotT: Finding Data Races in C++

Clean Code Talks - Dependency Injection

Clean Code Talks - Unit Testing

Partial Automation: Keeping humans in the loop

TotT: Contain Your Environment

GUI Testing: Don't Sleep Without Synchronization

Testability Explorer: Measuring Testability

Dependency Injection Myth: Reference Passing

Test Engineering at Google

TotT: Floating-Point Comparison

Labels

Archive

Feed