[GitHub] activemq-artemis pull request #2444: ARTEMIS-2186 Large message incomplete w...

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis pull request #2444: ARTEMIS-2186 Large message incomplete w...

clebertsuconic-3
GitHub user wy96f opened a pull request:

    https://github.com/apache/activemq-artemis/pull/2444

    ARTEMIS-2186 Large message incomplete when server is crashed

    When sending large message, the large message file on server side is not synced whereas the journal record or paged message is synced.
   
    When server crashes, large message file maybe incomplete.
   
    In our case, we saw an empty file when server crashed and started. Then a large message with zero size body was delivered to client, client occured exception:
   
    AMQ212058: Unable to get a message.: java.lang.IndexOutOfBoundsException: readerIndex(4) + length(1) exceeds writerIndex(4): UnpooledDuplicatedByteBuf(ridx: 4, widx: 4, cap: 4, unwrapped: UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeHeapByteBuf(ridx: 4, widx: 4, cap: 4))@ClientLargeMessageImpl

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wy96f/activemq-artemis sync_large_message

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/activemq-artemis/pull/2444.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2444
   
----
commit 96f7daf39e09bae3503d572a94f381a3f108c2db
Author: yang wei <wy96fyw@...>
Date:   2018-11-27T07:14:31Z

    ARTEMIS-2186 Large message incomplete when server is crashed

----


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis issue #2444: ARTEMIS-2186 Large message incomplete when ser...

clebertsuconic-3
Github user franz1981 commented on the issue:

    https://github.com/apache/activemq-artemis/pull/2444
 
    A test would not easily hit this: most OS take care to flush the page cache if a process crash "fsyncing" any pending contents. Only using a VM on a real machine shutdown would emulate this behaviour.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis issue #2444: ARTEMIS-2186 Large message incomplete when ser...

clebertsuconic-3
In reply to this post by clebertsuconic-3
Github user michaelandrepearce commented on the issue:

    https://github.com/apache/activemq-artemis/pull/2444
 
    I appreciate its not easy to replicate. But with some mocks maybe it might. The key thing is how do we ensure this is never regressed?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis issue #2444: ARTEMIS-2186 Large message incomplete when ser...

clebertsuconic-3
In reply to this post by clebertsuconic-3
Github user sebthom commented on the issue:

    https://github.com/apache/activemq-artemis/pull/2444
 
    From the description it is unclear what is meant by "server crash". JVM crash or host/machine crash. If the problem occured with unexpected JVM process termination maybe the broker can be started in a separate JVM processes and SIGKILLed while doing large message processing to reproduce the issue.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis pull request #2444: ARTEMIS-2186 Large message incomplete w...

clebertsuconic-3
In reply to this post by clebertsuconic-3
Github user clebertsuconic commented on a diff in the pull request:

    https://github.com/apache/activemq-artemis/pull/2444#discussion_r236681572
 
    --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/protocol/core/ServerSessionPacketHandler.java ---
    @@ -1018,6 +1018,7 @@ private void sendContinuations(final int packetSize,
              currentLargeMessage.addBytes(body);
     
              if (!continues) {
    +            currentLargeMessage.sync();
                 currentLargeMessage.releaseResources();
    --- End diff --
   
    wouldn't releaseResources make a sync when the file is closed?


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis issue #2444: ARTEMIS-2186 Large message incomplete when ser...

clebertsuconic-3
In reply to this post by clebertsuconic-3
Github user wy96f commented on the issue:

    https://github.com/apache/activemq-artemis/pull/2444
 
    > From the description it is unclear what is meant by "server crash". JVM crash or host/machine crash. If the problem occured with unexpected JVM process termination maybe the broker can be started in a separate JVM processes and SIGKILLed while doing large message processing to reproduce the issue.
   
    In our case it's a host crash. I don't think jvm crash or sending SIGKILL can reproduce it as franz1981 said linux os would take care to flush the page cache.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis issue #2444: ARTEMIS-2186 Large message incomplete when ser...

clebertsuconic-3
In reply to this post by clebertsuconic-3
Github user clebertsuconic commented on the issue:

    https://github.com/apache/activemq-artemis/pull/2444
 
    I am a bit confused as a file.close would issue a sync.  


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis issue #2444: ARTEMIS-2186 Large message incomplete when ser...

clebertsuconic-3
In reply to this post by clebertsuconic-3
Github user clebertsuconic commented on the issue:

    https://github.com/apache/activemq-artemis/pull/2444
 
    The release resources thing.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis issue #2444: ARTEMIS-2186 Large message incomplete when ser...

clebertsuconic-3
In reply to this post by clebertsuconic-3
Github user wy96f commented on the issue:

    https://github.com/apache/activemq-artemis/pull/2444
 
    > I am a bit confused as a file.close would issue a sync.
   
    It won't. As man page said:
    A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes.  It is not common for a file system to flush the buffers when the stream is closed.  If you need to be sure that the data is physically stored use fsync(2).  (It will depend on the disk hardware at this point.)



---
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis issue #2444: ARTEMIS-2186 Large message incomplete when ser...

clebertsuconic-3
In reply to this post by clebertsuconic-3
Github user clebertsuconic commented on the issue:

    https://github.com/apache/activemq-artemis/pull/2444
 
    nice catch then.. i will merge this and back port into 2.6.x


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis pull request #2444: ARTEMIS-2186 Large message incomplete w...

clebertsuconic-3
In reply to this post by clebertsuconic-3
Github user wy96f commented on a diff in the pull request:

    https://github.com/apache/activemq-artemis/pull/2444#discussion_r236911890
 
    --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/protocol/core/ServerSessionPacketHandler.java ---
    @@ -1018,6 +1018,7 @@ private void sendContinuations(final int packetSize,
              currentLargeMessage.addBytes(body);
     
              if (!continues) {
    +            currentLargeMessage.sync();
                 currentLargeMessage.releaseResources();
    --- End diff --
   
    Some other code is calling releaseResources such as LargeServerMessageImpl::deleteFile, LargeMessageDeliverer::finish which would cost an extra system call. We'd better sync when the large message file is first created and finished.
    I just found ServerSessionImpl::messageToLargeMessage needs a sync as well.
   
    Could we add a sync parameter in releaseResources? If true, file will be synced before closing.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis issue #2444: ARTEMIS-2186 Large message incomplete when ser...

clebertsuconic-3
In reply to this post by clebertsuconic-3
Github user wy96f commented on the issue:

    https://github.com/apache/activemq-artemis/pull/2444
 
    > nice catch then.. i will merge this and back port into 2.6.x
   
    I just found in ServerSessionImpl::messageToLargeMessage, StompSession::sendInternalLarge
    , ReplicationEndpoint::stop we need a sync too.
   
    Could we add a sync parameter in releaseResources? If true, file will be synced before closing.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis issue #2444: ARTEMIS-2186 Large message incomplete when ser...

clebertsuconic-3
In reply to this post by clebertsuconic-3
Github user clebertsuconic commented on the issue:

    https://github.com/apache/activemq-artemis/pull/2444
 
    I would just always sync on releaseResources. I don't see any case where we wanted to close without a previous sync given the semantic I missed.


---
Reply | Threaded
Open this post in threaded view
|

[GitHub] activemq-artemis issue #2444: ARTEMIS-2186 Large message incomplete when ser...

clebertsuconic-3
In reply to this post by clebertsuconic-3
Github user michaelandrepearce commented on the issue:

    https://github.com/apache/activemq-artemis/pull/2444
 
    Re testing could you simply not use a mocking framework to assert that file.sync was invoked when releaseResources is called? This way would ensure the file sync change isnt regressed from there.


---