CDC - Community Distributed Cache

About

CDC stands for Community Distributed Cache and allows for high-performance, scalable and distributed memory clustering cache based on Hazelcast for both CDA and Mondrian.

CDC is a pentaho plugin that provides the following features:

CDA distributed cache support
Mondrian distributed cache support
Ability to switch between default and CDC cache for cda and mondrian
Gracefully handles adding / removing new cache nodes
Allows to selectively clear cache of specific CDE dashboards
Allows to selectively clear cache of specific schemas / cubes / dimensions of mondrian cubes
Provides an API to clean the cache from the outside (eg: after running etl)
Provides a view over cluster status
Supports several memory configuration options

Motivation

Performance is a key point not only in business intelligence softwares but generally in any user interface. The goal of CDC is to give a Pentaho implementation based on Mondrian / CDA a distributed caching layer that can prevent as much as possible the database to be hit.

One added functionality is the ability to clear the cache of only specific mondrian cubes. Even though Mondrian has a very complete api to control the member's cache, Pentaho only exposes a clean all functionality that ends up being very limited in production environments.

The cache being able to survive server restarts is a design bonus, and supported by CDA out of the box. It will be supported by Mondrian as soon as MONDRIAN-1107 is fixed.

Requirements

Mondrian 3.4 or newer (in Pentaho 4.5)
CDA 12.05.15

Usage

It's very simple to configure CDC.

Install CDC using either the installer (soon to be available) or ctools-installer. If you do a manual install, be sure to copy the contents of solution/system/cdc/pentaho/lib to server's WEB-INF/lib
Download the standalone cache node
Execute the standalone cache node in the same machine as pentaho or in the same internal network (launch-hazelcast.sh), optionally editing the file and changing the memory settings (defaults to 1Gb, increase at will). You can launch as many nodes as you want.
Launch pentaho and click on the CDC button:

Enable cache usage on CDA and Mondrian
Restart pentaho server
Check on settings screen if they are satisfactory. Usually the defaults work fine.

Open analyzer, jpivot or a CDE dashboard that uses CDA and you should see the cache being populated

Cluster info

Hazelcast has a very good Management Center, so it's outside the scope of CDC to reimplement that kind of features. However, we do support a simple cluster information dashboard gives an overview of the state of the nodes.

Note about lite nodes: Pentaho server is itself a cache node. However, it's configured in such a way that doesn't hold data, thus the term lite node

Clean cache

With CDC you can selectively control the contents of the cache, allowing you to clean either specific dashboards or cubes. The business case around this is simple: We need to clear the cache after new data is available (usually as a result of a etl job). CDC allows not only to do that but also to do it from within the etl process.

CDA

CDC offers a solution navigator so that we can select a dashboard. When we select that dashboard, all the CDA queries used by that dashboard will be cleaned.

Clicking on the URL button we'll get a url that we can call externally (from an etl job). Be aware that you need to add the user credentials when calling from the outside (eg: &userid=joe&password=password)

Mondrian

This one is very similar to the previous one, but navigates through the available cubes. One can then either clean the entire schema, a specific cube or even the individual cell cache for a specific dimension (use this latest one with care).

Issues, bugs and feature requests

In order to report bugs, issues or feature requests, please use the Webdetails CDC Project Page

TCP46 Issue

There is a particularly nasty known issue, either at startup or when attempting to access hazelcast (ie putting elements in cache, accessing ClusterInfo). So far this issue has been confirmed in PCs running MacOS X.

diagnosis

If running netstat -a -n shows more than one socket on the same hazelcast port (by default they will start at 5701), and these are of different types (ie tcp4 and tcp46), you are likely to have this issue.

workaround

Make sure the -Djava.net.preferIPv4Stack flag is explicitly set to the same value on both your pentaho JVM (can set it in the JAVA_OPTS flag) and the standalone script.

Timeouts

Occasionally, CDA may report timeouts inserting/retrieving from cache:

ERROR [HazelcastQueryCache] Timeout 5 SECONDS expired inserting into cdaCache (timeout#3)

If these become too frequent, CDA may temporarily start bypassing cache:

ERROR [HazelcastQueryCache] Too many timeouts, disabling for 5 seconds.

If this happens occasionally shouldn't be a big deal, just hazelcast having trouble responding to a load spike. There are a few cda properties that can tune this behavior:

pt.webdetails.cda.cache.getTimeout: Timeout in seconds to fetch data from cache. A query can block for up to this time waiting for cache, on failure the query will be executed as if nothing was in cache.
pt.webdetails.cda.cache.putTimeout: Timeout in seconds for insertion after a successful query. This will not block the CDA query.
pt.webdetails.cda.cache.maxTimeouts: When this number of timeouts is reached, cache will be disabled for a period of time (disablePeriod) in an attempt not to overload hazelcast
pt.webdetails.cda.cache.disablePeriod: The cooldown period in seconds to bypass cache when maxTimeouts is reached

If these errors are recurrent and never seem to be recovered (ie no successful insertions/hits), it's a sign that no hazelcast node is responding. This can be a netowrk failure or a connected node may have crashed. Launching a new standalone instance should solve the problem.

prevention

The launch-hazelcast scripts are simple loops to relaunch a hazelcast instance on failure. There must be a running non-lite instance for requests to be handled properly. Always having at least two hazelcast standalone nodes running will greatly improve resilience to crashes and prevent loss of all cache data, by always having one node to handle requests while the other recovers.

License

CDC is licensed under the MPLv2 license.

Name		Name	Last commit message	Last commit date
Latest commit History 261 Commits
build-res		build-res
cdc-core		cdc-core
cdc-pentaho-base/src/pt/webdetails/cdc		cdc-pentaho-base/src/pt/webdetails/cdc
cdc-pentaho		cdc-pentaho
cdc-pentaho5		cdc-pentaho5
cdc-servlet		cdc-servlet
hazelcast-standalone		hazelcast-standalone
static-dist		static-dist
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
build.xml		build.xml
sonar-ant-task-1.3.jar		sonar-ant-task-1.3.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CDC - Community Distributed Cache

About

Motivation

Requirements

Usage

Cluster info

Clean cache

CDA

Mondrian

Issues, bugs and feature requests

TCP46 Issue

diagnosis

workaround

Timeouts

prevention

License

About

Releases 1

Packages

Contributors 10

Languages

License

webdetails/cdc

Folders and files

Latest commit

History

Repository files navigation

CDC - Community Distributed Cache

About

Motivation

Requirements

Usage

Cluster info

Clean cache

CDA

Mondrian

Issues, bugs and feature requests

TCP46 Issue

diagnosis

workaround

Timeouts

prevention

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 10

Languages

Packages