Zookeeper and LevelDB replication non reliable

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Zookeeper and LevelDB replication non reliable

mdidato
Hello,
in my current project we are trying to set up an activeMQ cluster with LevelDB replication.
Before even trying to configure it in production we decided to have a short spike where we could try out simple failing scenarios.
Our test configuration has a ZooKeeper ensemble of three nodes and an ActiveMQ cluster of three nodes.

The following is the configuration used for activeMQ: (of course the hostname is different for each node in the cluster)

 <persistenceAdapter>
            <replicatedLevelDB
                    replicas="3"
                    bind="tcp://0.0.0.0:0"
                    hostname="activemq1"
                    zkAddress="zk1:2181,zk2:2181,zk3:2181"
                    zkPath="/activemq/leveldb-stores"
            />
 </persistenceAdapter>


We have tried different scenarios and they seem to work.
The scenario we are not able to successfully try is when zookeeper leader goes down.
We start up three instances of zookeeper and three instances of activemq.
We observe that the zookeeper leader gets correctly elected. The zookeeper leader correctly elect an activeMQ master, which accepts producers messages and client can consume from it. Plus the web admin correctly works.
We deliberately kill the zookeeper leader instance to see what happen. The other 2 zk instances successfully elect another leader. The activeMQ master doesn't seem able to recover from the election of the new leader.

We get the following logs

2016-10-03 15:15:53,185 | ERROR | Could not accept connection : java.lang.InterruptedException | org.apache.activemq.broker.TransportConnector | ActiveMQ Transport Server Thread Han
dler: mqtt://0.0.0.0:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600
2016-10-03 15:15:53,196 | INFO  | Master stopped | org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ BrokerService[localhost] Task-4
2016-10-03 15:15:53,205 | INFO  | Connector ws stopped | org.apache.activemq.broker.TransportConnector | ActiveMQ BrokerService[localhost] Task-5
2016-10-03 15:15:53,217 | INFO  | Connector vm://localhost stopped | org.apache.activemq.broker.TransportConnector | ActiveMQ BrokerService[localhost] Task-5
2016-10-03 15:15:53,227 | WARN  | SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS confi
guration file: '../../conf.tmp/login.config'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. | org.apache.zookeeper.ClientC
nxn | WrapperSimpleAppMain-SendThread(zk2.docker_default:2181)
2016-10-03 15:15:53,228 | INFO  | Opening socket connection to server zk2.docker_default/172.18.0.4:2181 | org.apache.zookeeper.ClientCnxn | WrapperSimpleAppMain-SendThread(zk2.dock
er_default:2181)
2016-10-03 15:15:53,228 | WARN  | unprocessed event state: AuthFailed | org.apache.activemq.leveldb.replicated.groups.ZKClient | WrapperSimpleAppMain-EventThread
2016-10-03 15:15:53,230 | INFO  | Socket connection established to zk2.docker_default/172.18.0.4:2181, initiating session | org.apache.zookeeper.ClientCnxn | WrapperSimpleAppMain-Se
ndThread(zk2.docker_default:2181)
2016-10-03 15:15:53,241 | INFO  | Unable to read additional data from server sessionid 0x3578b1ac0d80000, likely server has closed socket, closing socket connection and attempting r
econnect | org.apache.zookeeper.ClientCnxn | WrapperSimpleAppMain-SendThread(zk2.docker_default:2181)
2016-10-03 15:15:53,363 | INFO  | JobSchedulerStore: /data/activemq/localhost/scheduler stopped. | org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl | ActiveMQ Broker
Service[localhost] Task-5
2016-10-03 15:15:53,367 | INFO  | StateChangeDispatcher terminated. | org.apache.activemq.leveldb.replicated.groups.ZKClient | ZooKeeper state change dispatcher thread
2016-10-03 15:15:53,994 | INFO  | Session: 0x3578b1ac0d80000 closed | org.apache.zookeeper.ZooKeeper | ActiveMQ BrokerService[localhost] Task-5



The new zookeeper instance is actually zk2, so it seems that the activeMQ master knows who is the leader of the ensemble. But as you can see it can't read additional data from the server.

Do you guys successfully managed to make this configuration work?


We are trying the entire configuration with a docker-compose stack

version: '2'
services:
  zk1:
    container_name: zk1
    image: zookeeper:latest
    ports:
      - "2181:2181"
      - "2888:2888"
      - "3888:3888"
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=zk1:2888:3888 server.2=zk2:2888:3888 server.3=zk3:2888:3888

  zk2:
    container_name: zk2
    image: zookeeper:latest
    ports:
      - "22181:2181"
      - "22888:2888"
      - "33888:3888"
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=zk1:2888:3888 server.2=zk2:2888:3888 server.3=zk3:2888:3888
  zk3:
    container_name: zk3
    image: zookeeper:latest
    ports:
      - "23181:2181"
      - "32888:2888"
      - "43888:3888"
    environment:
      ZOO_MY_ID: 3
      ZOO_SERVERS: server.1=zk1:2888:3888 server.2=zk2:2888:3888 server.3=zk3:2888:3888

  activemq1:
    container_name: activemq1
    image: webcenter/activemq:5.13.2
    ports:
    - "61617:61616"
    - "18161:8161"
    volumes:
    - /Users/Video/Projects/ActiveMqSpike/docker/activemq-conf:/opt/activemq/conf
    depends_on:
     - zk1
     - zk2
     - zk3

  activemq2:
    container_name: activemq2
    image: webcenter/activemq:5.13.2
    ports:
    - "61618:61616"
    - "28161:8161"
    volumes:
    - /Users/Video/Projects/ActiveMqSpike/docker/activemq-conf2:/opt/activemq/conf
    depends_on:
     - zk1
     - zk2
     - zk3

  activemq3:
    container_name: activemq3
    image: webcenter/activemq:5.13.2
    ports:
    - "61619:61616"
    - "38161:8161"
    volumes:
    - /Users/Video/Projects/ActiveMqSpike/docker/activemq-conf3:/opt/activemq/conf
    depends_on:
     - zk1
     - zk2
     - zk3


Where the volumes mounted contain the configuration I pasted at the beginning of this post.

Reply | Threaded
Open this post in threaded view
|

Re: Zookeeper and LevelDB replication non reliable

Tim Bain
The logs you quoted seem to indicate that the Zookeeper server (zk2) closed
the connection on you.  Is there anything in its logs that might indicate
why?  If not, you might try increasing the logging level on the Zookeeper
servers...

Tim

On Oct 3, 2016 3:17 PM, "mdidato" <[hidden email]> wrote:

Hello,
in my current project we are trying to set up an activeMQ cluster with
LevelDB replication.
Before even trying to configure it in production we decided to have a short
spike where we could try out simple failing scenarios.
Our test configuration has a ZooKeeper ensemble of three nodes and an
ActiveMQ cluster of three nodes.

The following is the configuration used for activeMQ: (of course the
hostname is different for each node in the cluster)

 <persistenceAdapter>
            <replicatedLevelDB
                    replicas="3"
                    bind="tcp://0.0.0.0:0"
                    hostname="activemq1"
                    zkAddress="zk1:2181,zk2:2181,zk3:2181"
                    zkPath="/activemq/leveldb-stores"
            />
 </persistenceAdapter>


We have tried different scenarios and they seem to work.
The scenario we are not able to successfully try is when zookeeper leader
goes down.
We start up three instances of zookeeper and three instances of activemq.
We observe that the zookeeper leader gets correctly elected. The zookeeper
leader correctly elect an activeMQ master, which accepts producers messages
and client can consume from it. Plus the web admin correctly works.
We deliberately kill the zookeeper leader instance to see what happen. The
other 2 zk instances successfully elect another leader. The activeMQ master
doesn't seem able to recover from the election of the new leader.

We get the following logs

2016-10-03 15:15:53,185 | ERROR | Could not accept connection :
java.lang.InterruptedException |
org.apache.activemq.broker.TransportConnector | ActiveMQ Transport Server
Thread Han
dler:
mqtt://0.0.0.0:1883?maximumConnections=1000&wireFormat.maxFrameSize=
104857600
2016-10-03 15:15:53,196 | INFO  | Master stopped |
org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ
BrokerService[localhost] Task-4
2016-10-03 15:15:53,205 | INFO  | Connector ws stopped |
org.apache.activemq.broker.TransportConnector | ActiveMQ
BrokerService[localhost] Task-5
2016-10-03 15:15:53,217 | INFO  | Connector vm://localhost stopped |
org.apache.activemq.broker.TransportConnector | ActiveMQ
BrokerService[localhost] Task-5
2016-10-03 15:15:53,227 | WARN  | SASL configuration failed:
javax.security.auth.login.LoginException: No JAAS configuration section
named 'Client' was found in specified JAAS confi
guration file: '../../conf.tmp/login.config'. Will continue connection to
Zookeeper server without SASL authentication, if Zookeeper server allows it.
| org.apache.zookeeper.ClientC
nxn | WrapperSimpleAppMain-SendThread(zk2.docker_default:2181)
2016-10-03 15:15:53,228 | INFO  | Opening socket connection to server
zk2.docker_default/172.18.0.4:2181 | org.apache.zookeeper.ClientCnxn |
WrapperSimpleAppMain-SendThread(zk2.dock
er_default:2181)
2016-10-03 15:15:53,228 | WARN  | unprocessed event state: AuthFailed |
org.apache.activemq.leveldb.replicated.groups.ZKClient |
WrapperSimpleAppMain-EventThread
2016-10-03 15:15:53,230 | INFO  | Socket connection established to
zk2.docker_default/172.18.0.4:2181, initiating session |
org.apache.zookeeper.ClientCnxn | WrapperSimpleAppMain-Se
ndThread(zk2.docker_default:2181)
2016-10-03 15:15:53,241 | INFO  | Unable to read additional data from server
sessionid 0x3578b1ac0d80000, likely server has closed socket, closing socket
connection and attempting r
econnect | org.apache.zookeeper.ClientCnxn |
WrapperSimpleAppMain-SendThread(zk2.docker_default:2181)
2016-10-03 15:15:53,363 | INFO  | JobSchedulerStore:
/data/activemq/localhost/scheduler stopped. |
org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl | ActiveMQ
Broker
Service[localhost] Task-5
2016-10-03 15:15:53,367 | INFO  | StateChangeDispatcher terminated. |
org.apache.activemq.leveldb.replicated.groups.ZKClient | ZooKeeper state
change dispatcher thread
2016-10-03 15:15:53,994 | INFO  | Session: 0x3578b1ac0d80000 closed |
org.apache.zookeeper.ZooKeeper | ActiveMQ BrokerService[localhost] Task-5



The new zookeeper instance is actually zk2, so it seems that the activeMQ
master knows who is the leader of the ensemble. But as you can see it can't
read additional data from the server.

Do you guys successfully managed to make this configuration work?


We are trying the entire configuration with a docker-compose stack

version: '2'
services:
  zk1:
    container_name: zk1
    image: zookeeper:latest
    ports:
      - "2181:2181"
      - "2888:2888"
      - "3888:3888"
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=zk1:2888:3888 server.2=zk2:2888:3888
server.3=zk3:2888:3888

  zk2:
    container_name: zk2
    image: zookeeper:latest
    ports:
      - "22181:2181"
      - "22888:2888"
      - "33888:3888"
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=zk1:2888:3888 server.2=zk2:2888:3888
server.3=zk3:2888:3888
  zk3:
    container_name: zk3
    image: zookeeper:latest
    ports:
      - "23181:2181"
      - "32888:2888"
      - "43888:3888"
    environment:
      ZOO_MY_ID: 3
      ZOO_SERVERS: server.1=zk1:2888:3888 server.2=zk2:2888:3888
server.3=zk3:2888:3888

  activemq1:
    container_name: activemq1
    image: webcenter/activemq:5.13.2
    ports:
    - "61617:61616"
    - "18161:8161"
    volumes:
    -
/Users/Video/Projects/ActiveMqSpike/docker/activemq-conf:/opt/activemq/conf
    depends_on:
     - zk1
     - zk2
     - zk3

  activemq2:
    container_name: activemq2
    image: webcenter/activemq:5.13.2
    ports:
    - "61618:61616"
    - "28161:8161"
    volumes:
    -
/Users/Video/Projects/ActiveMqSpike/docker/activemq-conf2:/opt/activemq/conf
    depends_on:
     - zk1
     - zk2
     - zk3

  activemq3:
    container_name: activemq3
    image: webcenter/activemq:5.13.2
    ports:
    - "61619:61616"
    - "38161:8161"
    volumes:
    -
/Users/Video/Projects/ActiveMqSpike/docker/activemq-conf3:/opt/activemq/conf
    depends_on:
     - zk1
     - zk2
     - zk3


Where the volumes mounted contain the configuration I pasted at the
beginning of this post.





--
View this message in context: http://activemq.2283324.n4.
nabble.com/Zookeeper-and-LevelDB-replication-non-reliable-tp4717449.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.