[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CustomSearch: NoHttpResponseException exception if interval between requests is 240+ seconds #1791

Closed
zhemaituk opened this issue Dec 8, 2022 · 14 comments
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@zhemaituk
Copy link
zhemaituk commented Dec 8, 2022

Environment details

OS type and version: macOS 13.0.1
Java version: jdk1.8.0_251.jdk
google-http-client-apache-v2-1.42.3

Steps to reproduce

  1. Create new ApacheHttpTransport
  2. Make a request to https://customsearch.googleapis.com/customsearch/v1
  3. Wait at least 240 seconds
  4. Make another request to https://customsearch.googleapis.com/customsearch/v1

Code example

try (CloseableHttpClient httpClient = HttpClients.custom()
                .setConnectionManager(new BasicHttpClientConnectionManager())
                .build()) {

            Map<String, String> params = new LinkedHashMap<>();
            params.put("key", GOOGLE_API_KEY); // Reproducible for invalid key as well
            params.put("cx", "02395f9f6867a1988");
            params.put("q", LONG_QUERY); // Any query longer than 2048 characters
            params.put("start", "1");

            HttpTransport transport = new ApacheHttpTransport(httpClient);
            final HttpRequestFactory requestFactory = transport.createRequestFactory();

            final HttpRequest request1 = requestFactory.buildPostRequest(
                    new GenericUrl("https://customsearch.googleapis.com/customsearch/v1"),
                    new UrlEncodedContent(params)
            );
            request1.getHeaders().set("X-HTTP-Method-Override", "GET");

            try {
                System.out.println(request1.execute().parseAsString());
            } catch (HttpResponseException ignore) {
                // Fails in case of invalid key. Safe to ignore for the sake of this test.
            }

            Thread.sleep(5 * 60 * 1000);
            params.put("start", "11");

            final HttpRequest request2 = requestFactory.buildPostRequest(
                    new GenericUrl("https://customsearch.googleapis.com/customsearch/v1"),
                    new UrlEncodedContent(params)
            );
            request2.getHeaders().set("X-HTTP-Method-Override", "GET");

            try {
                System.out.println(request2.execute().parseAsString());
            } catch (HttpResponseException ignore) {
                // Fails in case of invalid key. Safe to ignore for the sake of this test.
            }
}

Stack trace

org.apache.http.NoHttpResponseException: customsearch.googleapis.com:443 failed to respond

	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at com.google.api.client.http.apache.v2.ApacheHttpRequest.execute(ApacheHttpRequest.java:73)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1012)
	at com.example.NoHttpResponseTest.testDelayBetweenRequestsDirectGoogleApi(NoHttpResponseTest.java:136)

Any additional information below

Same error when using google-api-services-customsearch-v1-rev20210918-2.0.0 CustomSearchAPI.
No error, all completes fine, if using apache's CloseableHttpClient directly to make the same requests.

@lqiu96
Copy link
lqiu96 commented Dec 9, 2022

Thanks for reporting this and for the small repro. I'll take a look when I get a bit of time.

Can you explain this part to me?

No error, all completes fine, if using apache's CloseableHttpClient directly to make the same requests.

What did you mean by directly?

@lqiu96 lqiu96 added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p3 Desirable enhancement or fix. May not be included in next release. labels Dec 9, 2022
@zhemaituk
Copy link
Author

@lqiu96 thanks for taking a look. Directly = using apache's http client, without google http transport wrapper.
Sample code:

    /**
     * This test performs API requests directly via Apache Http Client (same configuration as the first test).
     * Unlike the first test, this one passes successfully.
     */
    @Test
    void testDelayBetweenRequestsDirectApacheHttpClient() throws Exception {
        try (CloseableHttpClient httpClient = HttpClients.custom()
                .setConnectionManager(new BasicHttpClientConnectionManager())
                .build()) {

            Map<String, String> params = new LinkedHashMap<>();
            params.put("key", GOOGLE_API_KEY);
            params.put("cx", "02395f9f6867a1988");
            params.put("q", LONG_QUERY);
            params.put("start", "1");

            List<NameValuePair> nameValuePairs = params.entrySet().stream().map(e -> new BasicNameValuePair(e.getKey(), e.getValue())).collect(Collectors.toList());
            HttpPost request = new HttpPost("https://customsearch.googleapis.com/customsearch/v1");
            request.addHeader("X-HTTP-Method-Override", "GET");

            request.setEntity(new UrlEncodedFormEntity(nameValuePairs, StandardCharsets.UTF_8));

            try (CloseableHttpResponse page1 = httpClient.execute(request)) {
                readAll(page1);
            }

            Thread.sleep(5 * 60 * 1000);
            params.put("start", "11");
            List<NameValuePair> nameValuePairs2 = params.entrySet().stream().map(e -> new BasicNameValuePair(e.getKey(), e.getValue())).collect(Collectors.toList());
            HttpPost request2 = new HttpPost("https://customsearch.googleapis.com/customsearch/v1");
            request2.addHeader("X-HTTP-Method-Override", "GET");

            request2.setEntity(new UrlEncodedFormEntity(nameValuePairs2, StandardCharsets.UTF_8));

            try (CloseableHttpResponse page2 = httpClient.execute(request2)) {
                readAll(page2);
            }
        }
    }

    private static String readAll(CloseableHttpResponse page) throws IOException {
        StringBuilder sb = new StringBuilder();
        for (int ch; (ch = page.getEntity().getContent().read()) != -1; ) {
            sb.append((char) ch);
        }
        return sb.toString();
    }

@lqiu96
Copy link
lqiu96 commented Dec 13, 2022

Hi @zhemaituk. I took look at the samples you provided and wasn't able to find a root cause yet -- I was able to reproduce.

I took a look at BasicHttpClientConnectionManager and saw that it was A connection manager for a single connection. This connection manager maintains only one active connection.. My guess right now is that our implementation isn't dealing with properly closing the connection once we get the result or some mechanism with keep-alive or TTL is keeping this connection as active. This seems odd since even calling disconnect() on the HttpResponse isn't solving this.

I'ved played around with using new PoolingHttpClientConnectionManager() or using new NoConnectionReuseStrategy():

try (CloseableHttpClient httpClient = HttpClients.custom()
                .setConnectionManager(new BasicHttpClientConnectionManager())
                .setConnectionReuseStrategy(new NoConnectionReuseStrategy())
                .build()) {

which seems like it may help with resolving this test. Though, do let me know otherwise.

I'll need to take another look later today and dig deeper.

@zhemaituk
Copy link
Author

Yeah, I switched to PoolingHttpClientConnectionManager for now, but wondering if the issue possibility is still there, just occurring much much less often.

@lqiu96
Copy link
lqiu96 commented Dec 21, 2022

Yeah, I switched to PoolingHttpClientConnectionManager for now, but wondering if the issue possibility is still there, just occurring much much less often.

Apologies, I haven't had much time to look into this recently. This is on my TODO list to investigate, but do let me know if you are facing any pressing/ blocking issues!

From my suspicions above, I think it should resolve your issues. I'm not too familiar with this library and I need a bit more time to give you the proof 😄

@lqiu96
Copy link
lqiu96 commented Dec 27, 2022

Hi, I've done a bit more digging and I have some thoughts. I'm not an expert with the Apache Http Library so I can only provide my guesses from my understanding -- I've added logging from some debugging below:

Google Client:

  1. Sleep for 5 min:
2022/12/26 16:11:53:132 EST [DEBUG] DefaultManagedHttpClientConnection - http-outgoing-0: set socket timeout to 0
2022/12/26 16:11:53:132 EST [DEBUG] BasicHttpClientConnectionManager - Connection can be kept alive indefinitely
Sleeping for 5 min
2022/12/26 16:16:53:142 EST [DEBUG] RequestAddCookies - CookieSpec selected: default
2022/12/26 16:16:53:142 EST [DEBUG] RequestAuthCache - Auth cache not set in the context
2022/12/26 16:16:53:142 EST [DEBUG] BasicHttpClientConnectionManager - Get connection for route {s}->https://customsearch.googleapis.com:443
...

Result: Fail

  1. Sleep for 2 min:
2022/12/26 15:56:32:439 EST [DEBUG] DefaultManagedHttpClientConnection - http-outgoing-0: set socket timeout to 0
2022/12/26 15:56:32:439 EST [DEBUG] BasicHttpClientConnectionManager - Connection can be kept alive indefinitely
Sleeping for 2 min
2022/12/26 15:58:32:448 EST [DEBUG] RequestAddCookies - CookieSpec selected: default
2022/12/26 15:58:32:449 EST [DEBUG] RequestAuthCache - Auth cache not set in the context
2022/12/26 15:58:32:449 EST [DEBUG] BasicHttpClientConnectionManager - Get connection for route {s}->https://customsearch.googleapis.com:443
...

Result: Pass

  1. Sleep for 5 min + Closing the connection manually:
2022/12/26 16:18:57:592 EST [DEBUG] DefaultManagedHttpClientConnection - http-outgoing-0: set socket timeout to 0
2022/12/26 16:18:57:592 EST [DEBUG] BasicHttpClientConnectionManager - Connection can be kept alive indefinitely
Sleeping for 5 min
2022/12/26 16:23:57:600 EST [DEBUG] BasicHttpClientConnectionManager - Closing connection
2022/12/26 16:23:57:601 EST [DEBUG] DefaultManagedHttpClientConnection - http-outgoing-0: Close connection
2022/12/26 16:23:57:609 EST [DEBUG] RequestAddCookies - CookieSpec selected: default
2022/12/26 16:23:57:609 EST [DEBUG] RequestAuthCache - Auth cache not set in the context
2022/12/26 16:23:57:609 EST [DEBUG] BasicHttpClientConnectionManager - Get connection for route {s}->https://customsearch.googleapis.com:443
...

Result: Pass

Apache Client:

  1. Sleep for 5 min:
2022/12/26 16:00:22:251 EST [DEBUG] DefaultManagedHttpClientConnection - http-outgoing-0: Close connection
2022/12/26 16:00:22:252 EST [DEBUG] MainClientExec - Connection discarded
2022/12/26 16:00:22:252 EST [DEBUG] BasicHttpClientConnectionManager - Releasing connection [Not bound]
Sleeping for 5 mins...
2022/12/26 16:05:22:288 EST [DEBUG] RequestAddCookies - CookieSpec selected: default
2022/12/26 16:05:22:289 EST [DEBUG] RequestAuthCache - Auth cache not set in the context
2022/12/26 16:05:22:289 EST [DEBUG] BasicHttpClientConnectionManager - Get connection for route {s}->https://customsearch.googleapis.com:443
2022/12/26 16:05:22:289 EST [DEBUG] DefaultManagedHttpClientConnection - http-outgoing-1: set socket timeout to 0
2022/12/26 16:05:22:289 EST [DEBUG] MainClientExec - Opening connection {s}->https://customsearch.googleapis.com:443
2022/12/26 16:05:22:319 EST [DEBUG] DefaultHttpClientConnectionOperator - Connecting to customsearch.googleapis.com/142.250.176.202:443
2022/12/26 16:05:22:319 EST [DEBUG] SSLConnectionSocketFactory - Connecting socket to customsearch.googleapis.com/142.250.176.202:443 with timeout 0
...

Result: Pass


From the logs above, it looks like the Apache Http Client has some logic internally to release/ close the connection. Given that there doesn't seem to be any Keep-Alive header that is sent back from the server, I suspect that is a sign in the Apache Http Client to have the connection be released (or there might some other logic that triggers it).

The Google Client seems to want to keep the connection alive and the test with manually closing the connection works to get the result the second time. Because this seems to only happen after 240+ seconds, I suspect that the connection is closed from the server side after ~4 minutes (possibly some server side configuration). The test with 5 minutes is going to try with the already established connection (though no longer valid) and fails to get a valid response.

I think the best bet is to either use a PooledHttpClientConnectionManager (allowing more than 1 connection), manually specify your keep-alive strategy to close a connection if no response after a certain amount of time (https://www.baeldung.com/httpclient-connection-management), or manually close the connection via:

basicHttpClientConnectionManager.closeIdleConnections(0, TimeUnit.MILLISECONDS); or

try (CloseableHttpClient httpClient = HttpClients.custom()
                .setConnectionManager(new BasicHttpClientConnectionManager())
                .setConnectionReuseStrategy(new NoConnectionReuseStrategy())
                .build()) {

@zhemaituk
Copy link
Author

@lqiu96 are you able to reproduce the issue with Apache http client alone? If we have a valid usage example which fails - we can log a bug against http-client itself.

@zhemaituk
Copy link
Author

I think I found a way to reproduce with Apache Client.
From what I see Google Client does not close the response, but closes content InputStream only.
If we do the same with Apache Client, the issue is reproducible as well:

            CloseableHttpResponse page1 = httpClient.execute(request);
            readAll(page1);
            page1.getEntity().getContent().close();
//            try (CloseableHttpResponse page1 = httpClient.execute(request)) {
//                readAll(page1);
//            }

So maybe closing the response instead of closing content InputStream would do?

@lqiu96
Copy link
lqiu96 commented Jan 3, 2023

@lqiu96 are you able to reproduce the issue with Apache http client alone? If we have a valid usage example which fails - we can log a bug against http-client itself.

I wasn't able to reproduce the issue with the earlier code above (even with some testing and tinkering). But, I was able to in the latest one where you don't close put the CloseableHttpResponse in try-with resources.

From what I see Google Client does not close the response, but closes content InputStream only. ... So maybe closing the response instead of closing content InputStream would do?

Yeah, I believe that is the case. From my understanding, there is a difference between releasing and closing the connection. Releasing the connection allows it to re-enter the connection pool (i.e. in the case of BasicHttpClientConnectionManager it releases the only usable connection back). This is what the Google Http Client does but doesn't close the connection since there isn't a CloseableHttpResponse.close() being called.

From this helpful Baeldung article: https://www.baeldung.com/httpclient-connection-management#re-use

I think we can mimic the connection close with something like this:
basicHttpClientConnectionManager.closeIdleConnections(0, TimeUnit.MILLISECONDS);

i.e.

            HttpResponse httpResponse = request1.execute();
            try {
                httpResponse.parseAsString();
            } catch (HttpResponseException ignore) {
                // Fails in case of invalid key. Safe to ignore for the sake of this test.
            } finally {
                httpResponse.disconnect();
                basicHttpClientConnectionManager.closeIdleConnections(0, TimeUnit.MILLISECONDS);
            }

Or any of the other suggestions above. Let me know you thoughts.

@zhemaituk
Copy link
Author

Concluding, it sounds like a bug in BasicHttpClientConnectionManager itself.

Running the same Apache Client code with two different connection managers:

PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager();
connectionManager.setDefaultMaxPerRoute(1);

5 seconds delay (connection is re-used):

2023/01/03 14:58:58:726 EST [DEBUG] PoolingHttpClientConnectionManager - Connection released: [id: 0][route: {s}->https://customsearch.googleapis.com:443][total available: 1; route allocated: 1 of 1; total allocated: 1 of 20]
2023/01/03 14:59:03:731 EST [DEBUG] PoolingHttpClientConnectionManager - Connection request: [route: {s}->https://customsearch.googleapis.com:443][total available: 1; route allocated: 1 of 1; total allocated: 1 of 20]
2023/01/03 14:59:03:733 EST [DEBUG] PoolingHttpClientConnectionManager - Connection leased: [id: 0][route: {s}->https://customsearch.googleapis.com:443][total available: 0; route allocated: 1 of 1; total allocated: 1 of 20]

5 minutes delay (connection is re-established):

2023/01/03 14:51:03:577 EST [DEBUG] PoolingHttpClientConnectionManager - Connection released: [id: 0][route: {s}->https://customsearch.googleapis.com:443][total available: 1; route allocated: 1 of 1; total allocated: 1 of 20]
2023/01/03 14:56:03:576 EST [DEBUG] PoolingHttpClientConnectionManager - Connection request: [route: {s}->https://customsearch.googleapis.com:443][total available: 1; route allocated: 1 of 1; total allocated: 1 of 20]
2023/01/03 14:56:03:578 EST [DEBUG] DefaultManagedHttpClientConnection - http-outgoing-0: Close connection
2023/01/03 14:56:03:578 EST [DEBUG] PoolingHttpClientConnectionManager - Connection leased: [id: 1][route: {s}->https://customsearch.googleapis.com:443][total available: 0; route allocated: 1 of 1; total allocated: 1 of 20]

With BasicHttpClientConnectionManager connectionManager = new BasicHttpClientConnectionManager(); and everything else the same - the 5 minutes delay test scenario fails.

I would say the most suitable workaround is to use PoolingHttpClientConnectionManager for now.

@zhemaituk
Copy link
Author

Logged a ticket for the http client project: https://issues.apache.org/jira/browse/HTTPCLIENT-2255

@lqiu96
Copy link
lqiu96 commented Jan 3, 2023

I would say the most suitable workaround is to use PoolingHttpClientConnectionManager for now.

I think this would work for now and the logs from your comment above do seem to support this.

I wasn't expecting this to work:

PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager();
connectionManager.setDefaultMaxPerRoute(1);
...
Logs: DefaultManagedHttpClientConnection - http-outgoing-0: Close connection

I would have thought that this would also run into the same issue with the Google Http Client. Given that there is only one connection per route and the connection wasn't closed via some sort of conn.close() (either AutoCloseable or manually invoked).


I do see this in the docs which may explain the behavior (or I may be missing something...):
PoolingHttpClientConnectionManager:
The handling of stale connections was changed in version 4.4. Previously, the code would check every connection by default before re-using it. The code now only checks the connection if the elapsed time since the last use of the connection exceeds the timeout that has been set. The default timeout is set to 2000ms.

vs.
BasicHttpClientConnectionManager:
This connection manager will make an effort to reuse the connection for subsequent requests with the same route. It will, however, close the existing connection and open it for the given route, if the route of the persistent connection does not match that of the connection request. If the connection has been already been allocated IllegalStateException is thrown.

PoolingHttpClientConnectionManager would check the connection and BasicHttpClientConnectionManager wouldn't unless it's a different route.

Concluding, it sounds like a bug in BasicHttpClientConnectionManager itself.
...
Logged a ticket for the http client project: https://issues.apache.org/jira/browse/HTTPCLIENT-2255

I think we've reached the same conclusion now. Thanks for submitting the issue -- Let's see what the Apache HttpClient folks have to say.

@zhemaituk
Copy link
Author

We got the response there - it's not reproducible with httpclient5 (I double checked as well), and won't be fixed for 4.x.
I guess upgrade to httpclient5 is way outside of the scope of this bug.
I'll let you decide if you like to close this ticket, and handle httpclient5 support separately (if there are plans to support it).

@lqiu96
Copy link
lqiu96 commented Jan 6, 2023

Thanks for the update @zhemaituk. I'll close this ticket and create a new issue to start a discussion regarding possibly migrating to Apache HttpClient v5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

2 participants