Artemis HA Failover from slave not working

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Artemis HA Failover from slave not working

boyangfan
Hi,

I have a single master-slave replication configuration going, but I am
running into issues where I end up with two live servers when I bring down
the original master. This is being ran in a Kubernetes cluster using static
connectors, but regardless, the behavior is not what I expected.

Basically what is happening is, master-slave configuration is set correctly
and then I bring down the master. The slave becomes the live server, but
when the original master comes back up, the slave(which is now the live)
doesn't register that the master came back up and doesn't fail over and now
there ends up being two live servers. Perhaps it has something to do with
the fact that it is running in a Kubernetes environment?

Here are my configurations for my master-slave:

MASTER.XML
      <core xmlns="urn:activemq:core" xsi:schemaLocation="urn:activemq:core
">
        <acceptors>
          <acceptor name="netty-acceptor">tcp://0.0.0.0:61618</acceptor>
        </acceptors>

        <connectors>
          <connector
name="master">tcp://artemis-servicemaster-0.artemis-service.test.svc.cluster.local:61618</connector>
          <connector
name="slave">tcp://artemis-serviceslave-0.artemis-service.test.svc.cluster.local:61618</connector>
        </connectors>

        <ha-policy>
          <replication>
            <master>
              <group-name>group1</group-name>
              <check-for-live-server>true</check-for-live-server>
            </master>
          </replication>
        </ha-policy>

        <cluster-connections>
          <cluster-connection name="jms-activemq-artemis">
            <connector-ref>master</connector-ref>
            <retry-interval>500</retry-interval>
            <use-duplicate-detection>true</use-duplicate-detection>
            <message-load-balancing>ON_DEMAND</message-load-balancing>
            <max-hops>1</max-hops>
            <static-connectors>
              <connector-ref>master</connector-ref>
              <connector-ref>slave</connector-ref>
            </static-connectors>
           </cluster-connection>
        </cluster-connections>
      </core>

-------------------
SLAVE.XML
      <core xmlns="urn:activemq:core" xsi:schemaLocation="urn:activemq:core
">
        <acceptors>
          <acceptor name="netty-acceptor">tcp://0.0.0.0:61618</acceptor>
        </acceptors>

        <connectors>
          <connector
name="master">tcp://artemis-servicemaster-0.artemis-service.test.svc.cluster.local:61618</connector>
          <connector
name="slave">tcp://artemis-serviceslave-0.artemis-service.test.svc.cluster.local:61618</connector>
        </connectors>

        <ha-policy>
          <replication>
            <slave>
              <group-name>group1</group-name>
              <allow-failback>true</allow-failback>
            </slave>
          </replication>
        </ha-policy>

        <cluster-connections>
          <cluster-connection name="jms-activemq-artemis">
            <connector-ref>slave</connector-ref>
            <retry-interval>500</retry-interval>
            <use-duplicate-detection>true</use-duplicate-detection>
            <message-load-balancing>ON_DEMAND</message-load-balancing>
            <max-hops>1</max-hops>
            <static-connectors>
              <connector-ref>master</connector-ref>
              <connector-ref>slave</connector-ref>
            </static-connectors>
           </cluster-connection>
        </cluster-connections>
      </core>



Any help with this would be greatly appreciated!







--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Reply | Threaded
Open this post in threaded view
|

Re: Artemis HA Failover from slave not working

jbertram
When you bring the master down what happens to its journal? The journal
(specifically the server.lock file) holds the node's unique identifier
(which is shared by the replicated slave). If the journal is lost when the
node is dropped then it has no way to pair with its slave when it comes
back up which would explain the behavior you're observing.


Justin

On Tue, Nov 12, 2019 at 5:54 PM boyangfan <[hidden email]> wrote:

> Hi,
>
> I have a single master-slave replication configuration going, but I am
> running into issues where I end up with two live servers when I bring down
> the original master. This is being ran in a Kubernetes cluster using static
> connectors, but regardless, the behavior is not what I expected.
>
> Basically what is happening is, master-slave configuration is set correctly
> and then I bring down the master. The slave becomes the live server, but
> when the original master comes back up, the slave(which is now the live)
> doesn't register that the master came back up and doesn't fail over and now
> there ends up being two live servers. Perhaps it has something to do with
> the fact that it is running in a Kubernetes environment?
>
> Here are my configurations for my master-slave:
>
> MASTER.XML
>       <core xmlns="urn:activemq:core" xsi:schemaLocation="urn:activemq:core
> ">
>         <acceptors>
>           <acceptor name="netty-acceptor">tcp://0.0.0.0:61618</acceptor>
>         </acceptors>
>
>         <connectors>
>           <connector
>
> name="master">tcp://artemis-servicemaster-0.artemis-service.test.svc.cluster.local:61618</connector>
>           <connector
>
> name="slave">tcp://artemis-serviceslave-0.artemis-service.test.svc.cluster.local:61618</connector>
>         </connectors>
>
>         <ha-policy>
>           <replication>
>             <master>
>               <group-name>group1</group-name>
>               <check-for-live-server>true</check-for-live-server>
>             </master>
>           </replication>
>         </ha-policy>
>
>         <cluster-connections>
>           <cluster-connection name="jms-activemq-artemis">
>             <connector-ref>master</connector-ref>
>             <retry-interval>500</retry-interval>
>             <use-duplicate-detection>true</use-duplicate-detection>
>             <message-load-balancing>ON_DEMAND</message-load-balancing>
>             <max-hops>1</max-hops>
>             <static-connectors>
>               <connector-ref>master</connector-ref>
>               <connector-ref>slave</connector-ref>
>             </static-connectors>
>            </cluster-connection>
>         </cluster-connections>
>       </core>
>
> -------------------
> SLAVE.XML
>       <core xmlns="urn:activemq:core" xsi:schemaLocation="urn:activemq:core
> ">
>         <acceptors>
>           <acceptor name="netty-acceptor">tcp://0.0.0.0:61618</acceptor>
>         </acceptors>
>
>         <connectors>
>           <connector
>
> name="master">tcp://artemis-servicemaster-0.artemis-service.test.svc.cluster.local:61618</connector>
>           <connector
>
> name="slave">tcp://artemis-serviceslave-0.artemis-service.test.svc.cluster.local:61618</connector>
>         </connectors>
>
>         <ha-policy>
>           <replication>
>             <slave>
>               <group-name>group1</group-name>
>               <allow-failback>true</allow-failback>
>             </slave>
>           </replication>
>         </ha-policy>
>
>         <cluster-connections>
>           <cluster-connection name="jms-activemq-artemis">
>             <connector-ref>slave</connector-ref>
>             <retry-interval>500</retry-interval>
>             <use-duplicate-detection>true</use-duplicate-detection>
>             <message-load-balancing>ON_DEMAND</message-load-balancing>
>             <max-hops>1</max-hops>
>             <static-connectors>
>               <connector-ref>master</connector-ref>
>               <connector-ref>slave</connector-ref>
>             </static-connectors>
>            </cluster-connection>
>         </cluster-connections>
>       </core>
>
>
>
> Any help with this would be greatly appreciated!
>
>
>
>
>
>
>
> --
> Sent from:
> http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Artemis HA Failover from slave not working

boyangfan
Hi Justin,

Thank you!! Looks like that was the issue. I wasn't creating a persistent
volume claim as part of my Kubernetes deployment, but after doing so, I no
longer see the issue.

Thanks again



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html