Quantcast

Can ActiveMQ take advantage of Oracle transparent application failover?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Can ActiveMQ take advantage of Oracle transparent application failover?

johanh
This post has NOT been accepted by the mailing list yet.
Hi list,

We are currently running a master-slave cluster of ActiveMQ nodes, with a JDBC persistence adapter backed by an Oracle 11g RAC database. This is working fine from an ActiveMQ perspective; when the master goes down, a slave automatically becomes the new master. However, we are having issues when one of the database nodes goes down, and I am looking for ways to have the ActiveMQ nodes handle this event as seamlessly as possible:

In order to take advantage of the transparent application failover feature of Oracle RAC, we have configured ActiveMQ to use a jdbcPersistenceApdapter, using the Oracle OCI JDBC driver for that adapter, and using a data source config for it like this:

  <bean id="oracle-ds" class="oracle.jdbc.pool.OracleOCIConnectionPool" destroy-method="close">
    <property name="URL" value="jdbc:oracle:oci:@namefromtnsnames_ora"/>
    <property name="user" value="xxx"/>
    <property name="password" value="yyy"/>
    <property name="poolConfig">
      <props>
        <prop key="connpool_min_limit">1</prop>
        <prop key="connpool_max_limit">25</prop>
        <prop key="connpool_increment">1</prop>
        <prop key="connpool_timeout">10</prop>
      </props>
    </property>
  </bean>

When one of the database nodes goes down, the master ActiveMQ node loses its database lock, with the database returning an "ORA-25402: transaction must roll back error". Per our DBA, this is normal behavior for Oracle RAC, and is the Oracle OCI JDBC driver's cue to retry the connect. However, this error causes the master ActiveMQ node to shut down entirely:

2011-05-13 16:20:19,494 [ActiveMQ Cleanup Timer] INFO  org.apache.activemq.store.jdbc.JDBCPersistenceAdapter.stopBroker():619 - No longer able to keep the exclusive lock so giving up being a master
2011-05-13 16:20:19,494 [ActiveMQ Cleanup Timer] INFO  org.apache.activemq.broker.BrokerService.stop():572 - ActiveMQ Message Broker (xxx, ID:xxx-39440-1305317526725-0:1) is shutting down

After this, the earlier master node is completely shut down (the process is no longer running), and none of the earlier slave nodes have taken over as master. Restarting the master node ActiveMQ service makes it reacquire the lock and become master again. Comparing these observations to what we see in the database, it seems that the OCI driver is attempting to fail over the connection to another database node (we can see some connections from it failing over), but apparently that process gets interrupted by the ActiveMQ master node shutting down per the above. Is there some way in ActiveMQ to have the master node either attempt to reacquire the lock itself, or wait for some time before shutting down, or is there some other way in which database transparent application failover has been implemented in ActiveMQ?

Any advice appreciated!

Regards,
Johan H
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can ActiveMQ take advantage of Oracle transparent application failover?

DT78
Hello,
You seem to know a lot about oci configuration with activemq.

Then I run activemq with oci, I have this error :
java.lang.UnsatisfiedLinkError: no ocijdbc10 in java.library.path

How do you got the ocijdbc10 library ?
Is a ocijdbc10.jar library or a ocijdbc10.so library that must exist and be specified in the -Djava.library.path ?

Thank you by advance.
DT
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can ActiveMQ take advantage of Oracle transparent application failover?

DT78
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can ActiveMQ take advantage of Oracle transparent application failover?

bsmith@skylinenet.net
In reply to this post by johanh
This question was posted May 16, 2011; 11:18am — by johanhonline johanh  but I cannot locate any follow-up or resolution.  

Currently (04/25/2017) we have the same issue and I need to understand how to integrate Active MQ with Oracle 12.1.0.2 TAF (Transparent Application Failover) so that bouncing one node of the Oracle RAC doesn't break Active MQ as described in the original post below.

I have read and understand the Oracle TAF documentation and combed through Active MQ lists.  Just need some input on how to get Active MQ to integrate with Oracle TAF.

Any pointers to manuals/documentation or resources describing tests (successful or not) to get Active MQ to work with Oracle TAF are appreciated.  

Thanks
-- bsmith@skylinenet.net

johanh wrote
Hi list,

We are currently running a master-slave cluster of ActiveMQ nodes, with a JDBC persistence adapter backed by an Oracle 11g RAC database. This is working fine from an ActiveMQ perspective; when the master goes down, a slave automatically becomes the new master. However, we are having issues when one of the database nodes goes down, and I am looking for ways to have the ActiveMQ nodes handle this event as seamlessly as possible:

<snipped... Oracle TAF configuration...>

When one of the database nodes goes down, the master ActiveMQ node loses its database lock, with the database returning an "ORA-25402: transaction must roll back error". Per our DBA, this is normal behavior for Oracle RAC, and is the Oracle OCI JDBC driver's cue to retry the connect. However, this error causes the master ActiveMQ node to shut down entirely:

<snipped..>

After this, the earlier master node is completely shut down (the process is no longer running), and none of the earlier slave nodes have taken over as master. Restarting the master node ActiveMQ service makes it reacquire the lock and become master again. Comparing these observations to what we see in the database, it seems that the OCI driver is attempting to fail over the connection to another database node (we can see some connections from it failing over), but apparently that process gets interrupted by the ActiveMQ master node shutting down per the above. Is there some way in ActiveMQ to have the master node either attempt to reacquire the lock itself, or wait for some time before shutting down, or is there some other way in which database transparent application failover has been implemented in ActiveMQ?

Any advice appreciated!

Regards,
Johan H
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can ActiveMQ take advantage of Oracle transparent application failover?

Tim Bain
The JDBC store hasn't received as much attention as the KahaDB and LevelDB
stores, so it's not entirely surprising that you're seeing the same
behavior as that earlier post.

It sounds like you have an understanding of how an application is supposed
to interact with an Oracle TAF database in this scenario (and I'm not sure
whether any of the people who do ActiveMQ development do), so you could
write a feature request in JIRA to ask for the Oracle driver to behave in
the appropriate ways. However, asking the broker to reset back to a clean
state and attempt to start again may be more effort than people want to put
into it.

Luckily there's a workaround: start the ActiveMQ process in a loop that
will start a new process if the first one dies. You can write your own
shell script, or you may be able to use a third-party wrapper such as
Tanuki. (I don't have any experience with Tanuki, so I can't say for sure
that their wrapper will allow this behavior, but I think it does.)

I'm also concerned about the fact that the slave broker didn't acquire the
lock when the master died; that feels like a second, separate bug to enter
into JIRA, please. Please include any relevant log lines from the slave
broker that might explain why it didn't work as expected. If possible, it
would also be great to get a thread dump from the slave broker when the
master goes down, since that might help someone trace why the existing code
didn't work as expected.

Tim

On Apr 25, 2017 12:57 PM, "[hidden email]" <[hidden email]>
wrote:

> This question was posted May 16, 2011; 11:18am — by johanhonline johanh
> but
> I cannot locate any follow-up or resolution.
>
> Currently (04/25/2017) we have the same issue and I need to understand how
> to integrate Active MQ with Oracle 12.1.0.2 TAF (Transparent Application
> Failover) so that bouncing one node of the Oracle RAC doesn't break Active
> MQ as described in the original post below.
>
> I have read and understand the Oracle TAF documentation and combed through
> Active MQ lists.  Just need some input on how to get Active MQ to integrate
> with Oracle TAF.
>
> Any pointers to manuals/documentation or resources describing tests
> (successful or not) to get Active MQ to work with Oracle TAF are
> appreciated.
>
> Thanks
> -- [hidden email]
>
>
> johanh wrote
> > Hi list,
> >
> > We are currently running a master-slave cluster of ActiveMQ nodes, with a
> > JDBC persistence adapter backed by an Oracle 11g RAC database. This is
> > working fine from an ActiveMQ perspective; when the master goes down, a
> > slave automatically becomes the new master. However, we are having issues
> > when one of the database nodes goes down, and I am looking for ways to
> > have the ActiveMQ nodes handle this event as seamlessly as possible:
> > <snipped... Oracle TAF configuration...>
> > When one of the database nodes goes down, the master ActiveMQ node loses
> > its database lock, with the database returning an "ORA-25402: transaction
> > must roll back error". Per our DBA, this is normal behavior for Oracle
> > RAC, and is the Oracle OCI JDBC driver's cue to retry the connect.
> > However, this error causes the master ActiveMQ node to shut down
> entirely:
> > <snipped..>
> > After this, the earlier master node is completely shut down (the process
> > is no longer running), and none of the earlier slave nodes have taken
> over
> > as master. Restarting the master node ActiveMQ service makes it reacquire
> > the lock and become master again. Comparing these observations to what we
> > see in the database, it seems that the OCI driver is attempting to fail
> > over the connection to another database node (we can see some connections
> > from it failing over), but apparently that process gets interrupted by
> the
> > ActiveMQ master node shutting down per the above. Is there some way in
> > ActiveMQ to have the master node either attempt to reacquire the lock
> > itself, or wait for some time before shutting down, or is there some
> other
> > way in which database transparent application failover has been
> > implemented in ActiveMQ?
> >
> > Any advice appreciated!
> >
> > Regards,
> > Johan H
>
>
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Can-ActiveMQ-take-advantage-of-Oracle-transparent-application-
> failover-tp3526485p4725226.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can ActiveMQ take advantage of Oracle transparent application failover?

bsmith@skylinenet.net
Tim,

Thanks for the detailed reply. I will share with our app design/development team for more in-depth discussion and post back with status on any workarounds or coding we come up with.

Also, may consider entering a feature request as you suggested. Agree it may be a bigger update than team wants to tackle but if nothing else it may stir some more discussion and brainstorming.

I've got to believe someone somewhere has had the same architecture with similar problem as follows:  Oracle RAC, Active MQ slave/master - take down one node of the RAC (for maintenance or unexpected problem) and Active MQ survives on the remaining node.

Just to clarify, I believe the slave broker didn't acquire the lock because it appears both master and slave were still pointed to the Oracle RAC node that was taken down for maintenance (node 2 was left up and we expected Active MQ to connect to the DB though that node.  We’re using an Oracle SCAN Listener - Single Client Access Name).  After we did maintenance, and rebooted node 1 of the RAC, the Active MQ was back and live (on that node).  So... may be able to do some scripting and redirecting from the Oracle side when a RAC nodes goes down... we'll see.

Again, thanks.  You or anyone else with more suggestions or experiences to share, be glad to hear and keep the thread alive.

Thanks,

-- bsmith@skylinenet.net


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can ActiveMQ take advantage of Oracle transparent application failover?

Steve Hill
I have not used oracle rac but it sounds like active mq is tied to a single instance rather than any available oracle server. Have you considered putting a load balancer in front that can determine whether the back end is up and can route requests appropriately?  Not sure if this would work as not had prior experience .

Thanks

> On Apr 26, 2017, at 9:08 AM, "[hidden email]" <[hidden email]> wrote:
>
> Tim,
>
> Thanks for the detailed reply. I will share with our app design/development
> team for more in-depth discussion and post back with status on any
> workarounds or coding we come up with.
>
> Also, may consider entering a feature request as you suggested. Agree it may
> be a bigger update than team wants to tackle but if nothing else it may stir
> some more discussion and brainstorming.
>
> I've got to believe someone somewhere has had the same architecture with
> similar problem as follows:  Oracle RAC, Active MQ slave/master - take down
> one node of the RAC (for maintenance or unexpected problem) and Active MQ
> survives on the remaining node.
>
> Just to clarify, I believe the slave broker didn't acquire the lock because
> it appears both master and slave were still pointed to the Oracle RAC node
> that was taken down for maintenance (node 2 was left up and we expected
> Active MQ to connect to the DB though that node.  We’re using an Oracle SCAN
> Listener - Single Client Access Name).  After we did maintenance, and
> rebooted node 1 of the RAC, the Active MQ was back and live (on that node).
> So... may be able to do some scripting and redirecting from the Oracle side
> when a RAC nodes goes down... we'll see.
>
> Again, thanks.  You or anyone else with more suggestions or experiences to
> share, be glad to hear and keep the thread alive.
>
> Thanks,
>
> -- [hidden email]
>
>
>
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.nabble.com/Can-ActiveMQ-take-advantage-of-Oracle-transparent-application-failover-tp3526485p4725240.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can ActiveMQ take advantage of Oracle transparent application failover?

bsmith@skylinenet.net
Thanks Steve, good point.

In this standard Oracle RAC configuration the SCAN listener effectively plays the role of load balancer. Active MQ connects to the database through port 1610 and then is directed to node a or node b on port 1521.

I’ve found an Oracle DocID: How To Configure Server Side Transparent Application Failover (Doc ID 460982.1) with which I going to experiment.  This as opposed client-side where somehow modifying Active MQ to include the OCI (Oracle Call Interface) code to perform the node switch.

 It looks like it will depend on how the Active MQ is querying the ACTIVEMQ_LOCK table and how often if Oracle will seamlessly redirect to the surviving node.  

Thanks for comments and suggestions, please keep them coming.  I’ll post progress.

Thanks,
-- bsmith@skylinenet.net


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Can ActiveMQ take advantage of Oracle transparent application failover?

Tim Bain
Pluggable storage lockers (
http://activemq.apache.org/pluggable-storage-lockers.html) could let you
work around any issues with the locking behavior under this configuration.

The code for the current Oracle interface is at
https://github.com/apache/activemq/blob/master/activemq-jdbc-store/src/main/java/org/apache/activemq/store/jdbc/adapter/OracleJDBCAdapter.java
(superclass:
https://github.com/apache/activemq/blob/master/activemq-jdbc-store/src/main/java/org/apache/activemq/store/jdbc/adapter/DefaultJDBCAdapter.java).
The code isn't all that complicated, so if you wanted to implement a TAC
capability, it wouldn't be too complicated of a task. And if you contribute
the code back to us, you won't have to maintain it and apply it to future
versions when you upgrade.

Tim

On Apr 26, 2017 9:29 AM, "[hidden email]" <[hidden email]>
wrote:

> Thanks Steve, good point.
>
> In this standard Oracle RAC configuration the SCAN listener effectively
> plays the role of load balancer. Active MQ connects to the database through
> port 1610 and then is directed to node a or node b on port 1521.
>
> I’ve found an Oracle DocID: How To Configure Server Side Transparent
> Application Failover (Doc ID 460982.1) with which I going to experiment.
> This as opposed client-side where somehow modifying Active MQ to include
> the
> OCI (Oracle Call Interface) code to perform the node switch.
>
>  It looks like it will depend on how the Active MQ is querying the
> ACTIVEMQ_LOCK table and how often if Oracle will seamlessly redirect to the
> surviving node.
>
> Thanks for comments and suggestions, please keep them coming.  I’ll post
> progress.
>
> Thanks,
> -- [hidden email]
>
>
>
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Can-ActiveMQ-take-advantage-of-Oracle-transparent-application-
> failover-tp3526485p4725245.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
Loading...