Sunday 27 October 2013

Ehcache: Cache Replication in Clustered Environment using JGroups

If you are using Ehcache and you want to replicate your cache across all the nodes in a clustered environment, you may find some fruitful information in this post.There are three different ways to replicate your cache across all the nodes in a cluster:
  • JGroups Replicated Caching
  • RMI Replicated Caching
  • JMS Replicated Caching
This post tells about ‘JGroups Replicated Caching’. JGroups is a simple clustered task distribution system. JGroups integration with Ehcache facilitates replicating the cache across the nodes in a cluster.

How to configure?

Cache replication configuration with JGroups is not much complicated .With very simple configuration you can achieve cache replication in your clustered environment. 


You need to configure below files for cache replication:
  • ApplicationContext.xml (Spring's application context file)
  • Ehcache.xml: (Ehcache configuration file)
  • JgroupCache.xml (JGroups configuration file for nodes communication)
ApplicationContext.xml: Configure 'EhCacheManagerFactoryBean' in application context file to initialize cache manager.

<bean id='ehCacheManager'
class="org.springframework.cache.ehcache.EhCacheManagerFactoryBean''>
<property name="configLocation" value="classpath:Ehcache.xml"/>
 <property name="shared" value="true" />
</bean> 

Ehcache.xml: To replicate cache in a cluster you need to configure below tags in 'Ehcache.xml' file:
  • cacheManagerPeerProviderFactory: This tag is used to create a CacheManagerPeerProvider, which discovers other CacheManagers in the cluster.
  • cacheEventListenerFactory: Enables registration of listeners for cache events, such as put, remove, update, and expire.
  • bootstrapCacheLoaderFactory: Specifies a BootstrapCacheLoader, which is called by a cache on initialization to prepopulate itself.
Each cache that will be distributed needs to set a cache event listener which replicates messages to the other CacheManager peers. This can be done by adding a 'cacheEventListenerFactory' element of type 'JGroupsCacheReplicatorFactory' to each distributed cache's configuration as per the following example:

<?xml version="1.0" encoding="UTF-8"?>
<ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="http://ehcache.org/ehcache.xsd"
    updateCheck="false">
    
    <cacheManagerPeerProviderFactory
        class="net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory"
        properties="file=JGroupsCache.xml" />  
   
    //This cache is configured to be replicated
    <cache name=”mycache" eternal="true" maxElementsInMemory="100"
        overflowToDisk="false" diskPersistent="false" timeToIdleSeconds="0"
        timeToLiveSeconds="60" memoryStoreEvictionPolicy="LRU">       

      <cacheEventListenerFactory
            class="net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory"
            properties="replicateAsynchronously=true, replicatePuts=true,
            replicateUpdates=true, replicateUpdatesViaCopy=false,
            replicateRemovals=true" />   
       
      <bootstrapCacheLoaderFactory
            class="net.sf.ehcache.distribution.jgroups.JGroupsBootstrapCacheLoaderFactory"
            properties="bootstrapAsynchronously=false" />  
    </cache>

</ehcache>

JgroupCache.xml: In this file you generally need to configure your nodes and there listening ports for cache replication.

<?xml version="1.0" encoding="UTF-8"?>
<config>
   <TCP bind_addr="host1" bind_port="7831" />
   <TCPPING timeout="3000"
       initial_hosts=" host1[7831], host2[7832]"  //Two nodes are in the cluster
       port_range="1"
      num_initial_members="2"/>
   <VERIFY_SUSPECT timeout="1500"  />
   <pbcast.NAKACK use_mcast_xmit="false" gc_lag="100"
      retransmit_timeout="300,600,1200,2400,4800"
      discard_delivered_msgs="false"/>
   <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="400000"/>
   <pbcast.GMS print_local_addr="true" join_timeout="5000" shun="false" view_bundling="true"/>
</config>

Frequently Asked Questions

1: What if I get below log message If one node tries to send cache notification to others?
'Dropped message from host1-64423 (not in xmit_table)'

Solution: There is a property named ‘discard_delivered_msgs’ should be false in JGroups configuration file.

2: How to keep JGroups configuration file out of web application war file?

Solution: In 'Ehcache.xml', you need not to hard code your 'JGroupsCache.xml'. You can specify this with the help of system property.

Define the JVM argument like -Djsgroup-config-location = C:\jgroups-configuration\JGroupsCache.xml
and then specify this property in 'Ehcache.xml' file.

<cacheManagerPeerProviderFactory
        class="net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory"
        properties="file=${jsgroup-config-location} " />  

3: How to analyze whether nodes are communicating each other?

Solution:  You must see below logs in your server to ensure whether your nodes are registered as per JGroups configuration or not.

Logs:
-------------------------------------------------------------------
GMS: address=IP-ADDRESS-41447, cluster=EH_CACHE, physical address= 2002:19a1:70a:0:0:0:19a1 :70a:58603
-------------------------------------------------------------------
[10/25/13 17:57:38:451 IST] 0000001e JGroupsCacheM I net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProvider init JGroups Replication started for 'EH_CACHE'. JChannel: local_addr=IP-ADDRESS-41447
cluster_name=EH_CACHE
my_view=[IP-ADDRESS-41447|0] [IP-ADDRESS-41447][IP-ADDRESS-41448]
connected=true
closed=false
incoming queue size=0
receive_blocks=false
receive_local_msgs=false
state_transfer_supported=true

4: How to enable 'Ehcache' logs?

Solution: In your log4j configuration file, add below entries to view the Ehcache related logs.

<category name="net.sf.ehcache"  additivity="false">
     <priority value="debug" />
     <appender-ref ref="console" />
  </category>
  <category name="net.sf.ehcache.config"  additivity="false">
     <priority value="debug" />
     <appender-ref ref="console" />
  </category>
  <category name="net.sf.ehcache.distribution"  additivity="false">
     <priority value="debug" />
     <appender-ref ref="console" />
  </category>

 Others                                                                                                        
  • To get more detail about cache replication methods in Ehcache, you can refer this link.
  • To get detailed information about above configuration you can refer this link.
  • You can also refer very nicely written post from here

2 comments:

  1. I have the exact setting like above but it seems to be doing a UDP connection instead of TCP, not sure if my interpretation is correct, can you pls check..
    UDP(bind_addr=/fe80:0:0:0:70f4:980e:2d13:aee8%11;oob_
    -------------------------------------------------------------------
    GMS: address=NIGSA725774-57928, cluster=EH_CACHE, physical address=fe80:0:0:0:70f4:980e:2d13:aee8%11:52066
    -------------------------------------------------------------------
    [INFO |2015-06-17 18:09:28.546|net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProvider] JGroups Replication started for 'EH_CACHE'. JChannel: local_addr=NIGSA725774-57928
    cluster_name=EH_CACHE
    my_view=[NIGSA725771-31043|3] [NIGSA725771-31043, NIGSA725774-57928]
    connected=true
    closed=false
    incoming queue size=0
    receive_blocks=false
    receive_local_msgs=false
    state_transfer_supported=true
    props=UDP(bind_addr=/fe80:0:0:0:70f4:980e:2d13:aee8%11;oob_thread_pool_keep_alive_time=5000;oob_thread_pool_enabled=true;max_bundle_size=64000;receive_on_all_interfaces=false;mcast_port=45588;thread_pool_min_threads=2;thread_pool_keep_alive_time=5000;enable_diagnostics=true;thread_pool_max_threads=8;ucast_send_buf_size=640000;ip_ttl=2;oob_thread_pool_queue_max_size=100;enable_bundling=true;thread_pool_queue_enabled=true;diagnostics_port=7500;oob_thread_pool_max_threads=8;disable_loopback=false;logical_addr_cache_max_size=20;ip_mcast=true;logical_addr_cache_expiration=120000;thread_pool_rejection_policy=discard;oob_thread_pool_min_threads=1;port_range=50;stats=true;mcast_send_buf_size=640000;id=21;mcast_recv_buf_size=25000000;diagnostics_addr=/ff0e:0:0:0:0:0:75:75;bind_port=0;tos=8;oob_thread_pool_rejection_policy=Run;loopback=true;oob_thread_pool_queue_enabled=false;enable_unicast_bundling=false;name=UDP;thread_pool_enabled=true;thread_naming_pattern=cl;ucast_recv_buf_size=20000000;discard_incompatible_packets=true;bundler_capacity=20000;max_bundle_timeout=30;mcast_group_addr=/ff0e:0:0:0:0:8:8:8;bind_interface_str=;marshaller_pool_size=0;num_timer_threads=4;log_discard_msgs=true;thread_pool_queue_max_size=10000;bundler_type=new)
    :PING(id=6;return_entire_cache=false;num_initial_members=3;break_on_coord_rsp=true;stats=true;name=PING;num_ping_requests=2;discovery_timeout=0;timeout=2000;num_initial_srv_members=0)
    :MERGE2(id=0;stats=true;merge_fast=true;name=MERGE2;inconsistent_view_threshold=1;min_interval=10000;merge_fast_delay=1000;max_interval=30000)
    :FD_SOCK(id=3;get_cache_timeout=1000;bind_addr=/fe80:0:0:0:70f4:980e:2d13:aee8%11;sock_conn_timeout=1000;bind_interface_str=;stats=true;name=FD_SOCK;suspect_msg_interval=5000;keep_alive=true;start_port=0;num_tries=3)
    :FD_ALL(id=29;interval=3000;stats=true;name=FD_ALL;msg_counts_as_heartbeat=false;timeout=5000)
    :VERIFY_SUSPECT(id=13;bind_addr=/fe80:0:0:0:70f4:980e:2d13:aee8%11;bind_interface_str=;stats=true;name=VERIFY_SUSPECT;num_msgs=1;use_icmp=false;timeout=1500)
    :BARRIER(id=0;max_close_time=60000;stats=true;name=BARRIER)
    :pbcast.NAKACK(gc_lag=0;use_mcast_xmit_req=false;use_mcast_xmit=true;max_msg_batch_size=20000;xmit_from_random_member=false;stats=true;retransmit_timeouts=300,600,1200;exponential_backoff=0;log_not_found_msgs=true;enable_xmit_time_stats=false;discard_delivered_msgs=true;print_stability_history_on_failed_xmit=false;id=15;xmit_history_max_size=50;use_stats_for_retransmission=false;max_rebroadcast_timeout=2000;name=NAKACK;log_discard_msgs=true;max_xmit_buf_size=0;use_range_based_retransmitter=true)
    :UNICAST(id=12;max_retransmit_time=60000;max_msg_batch_size=50000;loopback=false;sta

    ReplyDelete
  2. Hi, thank you for your post.

    Do you have sometimes blocking messages at startup where my_view add a new address in its list (an unknown member not declared for example : my_view=[web1-29955|68] [web1-29955, b321f703-bea7-528f-828c-d4377a69de6d, web2-19991] where b321f703-bea7-528f-828c-d4377a69de6d is not a declared member) with the message :
    WARNING: web1-29955: no physical address for b321f703-bea7-528f-828c-d4377a69de6d, dropping message

    Thanks you for your reply.
    Best regards,
    J.

    ReplyDelete