GEODE-9854: Orphaned .drf file causing memory leak #7145

jvarenina · 2021-11-28T20:15:57Z

Issue:
An OpLog files are compacted, but the .drf file is left because it contains deletes of
entries in previous .crfs. The .crf file is deleted, but the orphaned .drf is not until all
previous .crf files (.crfs with smaller id) are deleted.

The problem is that compacted Oplog object representing orphaned .drf file holds
a structure in memory (Oplog.regionMap) that contains information that is not useful
after the compaction and it takes certain amount of memory. Besides, there is a special
case when creating .krf files that, depending on the execution order,
could make the problem more severe (it could leave pendingKrfTags structure
on the regionMap and this could take up a significant amount of memory). This
pendingKrfTags HashMap is actually empty, but consumes memory because it was used
previously and the size of the HashMap was not reduced after it is cleared.
This special case usually happens when new Oplog is rolled out and previous Oplog
is immediately marked as eligible for compaction. Compaction and .krf creation start at
the similar time and compactor cancels creation of .krf file.
The pendingKrfTags structure is usually cleared when .krf file is created, but since
compaction canceled creation of .krf, the pendingKrfTags structure remain in memory
until Oplog representing orphaned .drf file is deleted.

Solution:
Clear the regionMap data structure of the Oplog when it is compacted (currently it is
deleted when the Oplog is destroyed).

Co-authored-by: Alberto Gomez alberto.gomez@est.tech

For all changes:

Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?
Has your PR been rebased against the latest commit within the target branch (typically develop)?
Is your initial contribution a single, squashed commit?
Does gradlew build run cleanly?
Have you written or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

upthewaterspout · 2021-12-01T23:15:32Z

geode-core/src/main/java/org/apache/geode/internal/cache/Oplog.java

@@ -5942,6 +5943,8 @@ void cleanupAfterCompaction(boolean compactFailed) {
    if (!compactFailed) {
      // all data has been copied forward to new oplog so no live entries remain
      getTotalLiveCount().set(0);
+      // No need for regionMap as there are no more live entries and .crf will be deleted
+      regionMap.set(new ConcurrentHashMap<>());


I think this might be simpler if you just called clear on the regionMap? No need for a separate atomic reference to a region map.

@upthewaterspout thank you for the review!

What you are proposing was actually our first solution to this problem and that we later changed to this current one. When instantiated, the regionMap HashMap will be created with initial capacity of 16. This is really small value and it doesn't add a lot to the heap memory. The problem is that this HashMap can grow a lot from it's initial size during lifetime of the Oplog. It could contain hundreds or even more then thousand of elements. In order to consume that many elements the HashMap have to be rehashed to the bigger capacity. The problem is that new (bigger) capacity remain even after we clear the regionMap HashMap, and even it is empty, it is consuming not negligible amount of memory.

We now know that it is possible for server to collect a lot of orphaned .drf files during the lifetime, and considering that difference between initial capacity of 16 and 1024 is huge in terms of memory consumption. So this is why we decided to use atomic reference and just reinitialize regionMap with new object of initial size 16.

DonalEvans

I agree with Dan's comment, but otherwise looks good. I especially appreciate the thorough explanations of what the tests are doing.

geode-core/src/main/java/org/apache/geode/internal/cache/Oplog.java

Issue: An OpLog files are compacted, but the .drf file is left because it contains deletes of entries in previous .crfs. The .crf file is deleted, but the orphaned .drf is not until all previous .crf files (.crfs with smaller id) are deleted. The problem is that compacted Oplog object representing orphaned .drf file holds a structure in memory (Oplog.regionMap) that contains information that is not useful after the compaction and it takes certain amount of memory. Besides, there is a race condition in the code when creating .krf files that, depending on the execution order, could make the problem more severe (it could leave pendingKrfTags structure on the regionMap and this could take up a significant amount of memory). This pendingKrfTags HashMap is actually empty, but consumes memory because it was used previously and the size of the HashMap was not reduced after it is cleared. This race condition usually happens when new Oplog is rolled out and previous Oplog is immediately marked as eligible for compaction. Compaction and .krf creation start at the similar time and compactor cancels creation of .krf if it is executed first. The pendingKrfTags structure is usually cleared when .krf file is created, but since compaction canceled creation of .krf, the pendingKrfTags structure remain in memory until Oplog representing orphaned .drf file is deleted. Solution: Clear the regionMap data structure of the Oplog when it is compacted (currently it is deleted when the Oplog is destroyed). Co-authored-by: Alberto Gomez <alberto.gomez@est.tech>

* introduced inner static class RegionMap in Oplog * RegionMap.get() will return allways empty map if it was closed before * When closing disk region skip adding only drf oplog to unrecovered map and also don't try to remove it from regionMap (it was already removed during compaction).

geode-core/src/main/java/org/apache/geode/internal/cache/Oplog.java

dschneider-pivotal · 2021-12-03T18:39:58Z

geode-core/src/main/java/org/apache/geode/internal/cache/Oplog.java

-          totalLiveCount.addAndGet(-clearCount);
-          // no need to call handleNoLiveValues because we now have an
-          // unrecovered region.
+      if (!isDrfOnly()) {


If this is a safe change to make to close then it seems like you should also avoid calling lockCompactor if isDrfOnly(). It is called to prevent the compactor from running while the code inside the try is running. Or did you determine that isDrfOnly can only be safely called while the compactor is not running?
The only concern I have about this new isDrfOnly check here is that it also causes us not to call addUnrecoveredRegion. Most of the places that use the unrecoveredRegion count, use it to figure out if this oplog might have some live values. So for this drfOnly oplog it does not have any live values so those places in the code don't need to worry about this oplog having live values.
But it is also used in checkForRecoverableRegion which will call dri.testAndSetRecovered which will associate the dri with its DiskRegionView. I think we probably still need to do this and if I'm correct then we still need to call addUnrecoveredRegion here even if isDrfOnly() is true.
For those places that do use unrecoveredRegionCount > 0 to indicate that something can't be done; could you in those places also check for isDrfOnly? I'm thinking of hasNoLiveValues() and needsCompaction().

We have put check ifDrfOnly() after the lockCompactor() on purpose because of the following case:

Oplog files (.crf, .drf and .krf) are in a process of compaction and lockCompactor() is acquired. If we check isDrfOnly() before trying to acquire the lock, then it would most probably return false, because all Oplog files (.crf, .drf and .krf) exists during compaction (files are deleted as a last step of compaction, so it possible for check to return true only during that period). After check is performed the thread will continue and block (on lockCompactor()) until compaction is done. Compaction could then produce the .drf only file and release the lock, and in that case thread would resume with executing unnecessary code.

Because we put the the ifDrfOnly() after lockCompactor(), the thread will first block until compaction is done. If compaction produces only .drf file, then ifDrfOnly() will be checked after acquiring the lock and unnecessary code will be skipped.

We think that it is safe to skip addUnrecoveredRegion() when Oplog is only .drf file. This is because only .drf Oplog cannot contain live entries and therefore cannot contain unrecoverable live region entries. We would expect that having unrecoverable region live entry's is only possible when there is .crf file available (please have in mind that .crf file is deleted due to compaction when there is only .drf file). And this would only happen when one of the multiple regions that has been using this disk-store Oplog is not yet recovered. In that case it is not possible to perform compaction as geode did not load the the unrecoverable region data from .crf file in memory and it doesn't have live entries references to do the compaction (please check linked list DiskRegionInfoWithList.liveEntries). Not sure if we got this case right and if there is something that we misunderstood?

We also think that it is not necessary to place additional check for ifDrfOnly() on places where unrecoveredRegionCount > 0 is checked (e.g. hasNoLiveValues() and needsCompaction()). If compaction produced only .drf file then both totalLiveCount and unrecoveredRegionCount will be 0 for that Oplog object. If unrecoveredRegionCount is greater then 0 than compaction will not be done in the first place, and .crf would not be deleted.

Function hasNoLiveValues() -> In this function there is already check whether totalLiveCount.get() <= 0. This condition will be always true for the only .drf files.

Function needsCompaction() -> I'm not sure in which case geode will try to mark only .drf file as ready for compaction. Even if this happens then compactor thread will check if hasNoLiveValues() is true and then it will call handleNoLiveValues(). This function will then (if this is oldest only .drf Oplog) delete all only .drf files that don't have .crf file with older Id. Otherwise (if it is not oldest only .drf file) it would just call handleEmpty() which would then cancel creation of .krf and will just return as only .drf Oplog is already closed. I'm not sure if we will break something if we add this additional check in needsCompaction(). Please check code below:

void handleNoLiveValues() { if (!doneAppending) { return; } if (hasNoLiveValues()) { if (LocalRegion.ISSUE_CALLBACKS_TO_CACHE_OBSERVER) { if (calledByCompactorThread()) { // after compaction, remove the oplog from the list & destroy it CacheObserverHolder.getInstance().beforeDeletingCompactedOplog(this); } else { CacheObserverHolder.getInstance().beforeDeletingEmptyOplog(this); } } if (isOldest()) { if (calledByCompactorThread()) { // do it in this compactor thread handleEmptyAndOldest(true); } else { // schedule another thread to do it getParent().executeDiskStoreTask(() -> handleEmptyAndOldest(false)); } } else { if (calledByCompactorThread()) { // do it in this compactor thread handleEmpty(true); } else { // schedule another thread to do it getParent().executeDiskStoreTask(() -> handleEmpty(false)); } }

I think you have addressed most of my concerns. I'm still worried about the following corner case if we don't add a drf only oplog to the unrecovered region count.
What happens if you have a region that still has live data in an old crf oplog and has only deletes in a newer drf oplog. That newer oplog will be kept alive by the older crf so the newer oplog will be marked as drf only. Then someone calls "Region.close" on that region. This causes the persistent data to be kept on disk but removed from memory. Then that same member turns around and recreates the region it closed. If the close did not remember that drf only oplog had an "unrecovered region" for the drf only one then I'm not sure the recovery will work when the region is recreated. Since it does not reread the disk store files (if, crf, drf, krf) in their entirety (like we do during a normal startup) enough info needs to be kept in memory so that we know that drf only oplog has some info about the region we are recreating. I think this association on the recreate happens in this "checkForRecoverableRegion which will call dri.testAndSetRecovered which will associate the dri with its DiskRegionView".
Does a test exist for this use case (close + recreate with drf only oplog)? If not if you add one you might find that drf only oplogs are handled in some other way during recreate without them being in the unrecoveredRegionCount.

We have created the following test cases (for more information please check comments in test cases):

Recovery of single region after cache is restarted (testCompactorRegionMapDeletedForOnlyDrfOplogAfterCompactionAndRecoveryAfterCacheClosed)

Recovery of single region after region is closed and then recreated (testCompactorRegionMapDeletedForOnlyDrfOplogAfterCompactionAndRecoveryAfterRegionClose)

In all cases all data have been successfully recovered from Oplog files, and loaded into the cache memory. It seems that this counter in only used to skip compaction of Oplog file (.crf file) when all region that use Oplog (.crf file) didn't yet recover (e.g. region were closed). All orphaned/only .drf files that reference older .crf Opolg cannot be deleted until older .crf Oplog are deleted. So having unrecoveredRegionCount on .crf also prevents deleting any newer .drf file that could contain destroy references to it's entries. We could not find the case were not marking .drf only Oplog as unrecovered during close could cause some issues. Could you please share exact steps for the case that you think it will cause problem and I will try to create it?

During these tests we have found another problem in design base (not introduced by this PR) when region is closed and then recreated to start the recovery. If you inspect this code in close() function you will notice that it doesn't make any sense:

void close(DiskRegion dr) { // while a krf is being created can not close a region lockCompactor(); try { if (!isDrfOnly()) { addUnrecoveredRegion(dr.getId()); DiskRegionInfo dri = getDRI(dr); if (dri != null) { long clearCount = dri.clear(null); if (clearCount != 0) { totalLiveCount.addAndGet(-clearCount); // no need to call handleNoLiveValues because we now have an // unrecovered region. } regionMap.get().remove(dr.getId(), dri); } } } finally { unlockCompactor(); } } private void addUnrecoveredRegion(long drId) { DiskRegionInfo dri = getOrCreateDRI(drId); if (dri.testAndSetUnrecovered()) { unrecoveredRegionCount.incrementAndGet(); } }

Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as unrecovered and increments counter unrecoveredRegionCount. This DiskRegionInfo object is contained in regionMap structure. Then afterwards it removes DiskRegionInfo object (that was previously marked as unrecovered) from the regionMap. This doesn't make any sense, it updated object and then removed it from map to be garbage collected. As you will see later on this will cause some issues when region is recovered.

Please check this code at recovery:

/** * For each dri that this oplog has that is currently unrecoverable check to see if a DiskRegion * that is recoverable now exists. */ void checkForRecoverableRegion(DiskRegionView dr) { if (unrecoveredRegionCount.get() > 0) { DiskRegionInfo dri = getDRI(dr); if (dri != null) { if (dri.testAndSetRecovered(dr)) { unrecoveredRegionCount.decrementAndGet(); } } } }

The problem is that geode will not clear counter unrecoveredRegionCount in Oplog objects after recovery is done. This is because checkForRecoverableRegion will check unrecoveredRegionCount counter and perform testAndSetRecovered. The testAndSetRecovered will always return false, because non of the DiskRegionInfo objects in region map have unrecovered flag set to true (all object marked as unrecovered were deleted by close(), and then they were recreated during recovery.... see note below). The problem here is that all Oplogs will be fully recovered with the counter incorrectly indicating unrecoveredRegionCount>0. This will later on prevent the compaction of recovered Oplogs (the files that have .crf, .drf and .krf) when they reach compaction threshold.

Note: During recovery regionMap will be recreated from the Oplog files. Since all DiskRegionInfo objects are deleted from regionMap during the close(), they will be recreated by using function initRecoveredEntry during the recovery. All DiskRegionInfo will be created with flag unrecovered set to false.

Thanks for adding the additional tests.
Is the solution to the new problem you discovered to not remove the dri from the regionMap in close? I think this could be a different ticket since it is unrelated to the memory leak.

We also agree that this new issue should be handled in a new ticket. We will create it and try to propose the solution. As you already said the solution could be just to not remove the dri from regionMap in close() for Oplog that is not .drf only.

Thank you for review and approval!

1. Recovery of single region after cache is closed and then recreated (testCompactorRegionMapDeletedForOnlyDrfOplogAfterCompactionAndRecoveryAfterCacheClosed) 2. Recovery of single region after region is closed and then recreated (testCompactorRegionMapDeletedForOnlyDrfOplogAfterCompactionAndRecoveryAfterRegionClose)

...eode/internal/cache/DiskRegionCompactorClearsObjectThatAreNoLongerNeededIntegrationTest.java

dschneider-pivotal · 2021-12-09T18:37:49Z

the acceptance test failure of GeodeRedisServerStartupDUnitTest is a known issue

* GEODE-9854: Orphaned .drf file causing memory leak Issue: An OpLog files are compacted, but the .drf file is left because it contains deletes of entries in previous .crfs. The .crf file is deleted, but the orphaned .drf is not until all previous .crf files (.crfs with smaller id) are deleted. The problem is that compacted Oplog object representing orphaned .drf file holds a structure in memory (Oplog.regionMap) that contains information that is not useful after the compaction and it takes certain amount of memory. Besides, there is a race condition in the code when creating .krf files that, depending on the execution order, could make the problem more severe (it could leave pendingKrfTags structure on the regionMap and this could take up a significant amount of memory). This pendingKrfTags HashMap is actually empty, but consumes memory because it was used previously and the size of the HashMap was not reduced after it is cleared. This race condition usually happens when new Oplog is rolled out and previous Oplog is immediately marked as eligible for compaction. Compaction and .krf creation start at the similar time and compactor cancels creation of .krf if it is executed first. The pendingKrfTags structure is usually cleared when .krf file is created, but since compaction canceled creation of .krf, the pendingKrfTags structure remain in memory until Oplog representing orphaned .drf file is deleted. Solution: Clear the regionMap data structure of the Oplog when it is compacted (currently it is deleted when the Oplog is destroyed). * introduced inner static class RegionMap in Oplog * RegionMap.get() will return always empty map if it was closed before * When closing disk region skip adding only drf oplog to unrecovered map and also don't try to remove it from regionMap (it was already removed during compaction). * Following test cases are introduced: 1. Recovery of single region after cache is closed and then recreated (testCompactorRegionMapDeletedForOnlyDrfOplogAfterCompactionAndRecoveryAfterCacheClosed) 2. Recovery of single region after region is closed and then recreated (testCompactorRegionMapDeletedForOnlyDrfOplogAfterCompactionAndRecoveryAfterRegionClose) Co-authored-by: Alberto Gomez <alberto.gomez@est.tech>

…)" For additional info please check https://issues.apache.org/jira/browse/GEODE-9903 This reverts commit 324ed89.

…7209) For additional info please check https://issues.apache.org/jira/browse/GEODE-9903 This reverts commit 324ed89.

* GEODE-9854: Orphaned .drf file causing memory leak Issue: An OpLog files are compacted, but the .drf file is left because it contains deletes of entries in previous .crfs. The .crf file is deleted, but the orphaned .drf is not until all previous .crf files (.crfs with smaller id) are deleted. The problem is that compacted Oplog object representing orphaned .drf file holds a structure in memory (Oplog.regionMap) that contains information that is not useful after the compaction and it takes certain amount of memory. Besides, there is a race condition in the code when creating .krf files that, depending on the execution order, could make the problem more severe (it could leave pendingKrfTags structure on the regionMap and this could take up a significant amount of memory). This pendingKrfTags HashMap is actually empty, but consumes memory because it was used previously and the size of the HashMap was not reduced after it is cleared. This race condition usually happens when new Oplog is rolled out and previous Oplog is immediately marked as eligible for compaction. Compaction and .krf creation start at the similar time and compactor cancels creation of .krf if it is executed first. The pendingKrfTags structure is usually cleared when .krf file is created, but since compaction canceled creation of .krf, the pendingKrfTags structure remain in memory until Oplog representing orphaned .drf file is deleted. Solution: Clear the regionMap data structure of the Oplog when it is compacted (currently it is deleted when the Oplog is destroyed). * introduced inner static class RegionMap in Oplog * RegionMap.get() will return always empty map if it was closed before * When closing disk region skip adding only drf oplog to unrecovered map and also don't try to remove it from regionMap (it was already removed during compaction). * Following test cases are introduced: 1. Recovery of single region after cache is closed and then recreated (testCompactorRegionMapDeletedForOnlyDrfOplogAfterCompactionAndRecoveryAfterCacheClosed) 2. Recovery of single region after region is closed and then recreated (testCompactorRegionMapDeletedForOnlyDrfOplogAfterCompactionAndRecoveryAfterRegionClose) Co-authored-by: Alberto Gomez <alberto.gomez@est.tech>

…)" (apache#7209) For additional info please check https://issues.apache.org/jira/browse/GEODE-9903 This reverts commit 324ed89.

…)" This reverts commit f4c421b.

jvarenina force-pushed the feature/GEODE-9854 branch from a0d2ce3 to 3d3b68c Compare November 29, 2021 18:28

jvarenina marked this pull request as ready for review November 29, 2021 21:31

jvarenina requested review from DonalEvans, jchen21, kirklund and nabarunnag as code owners November 29, 2021 21:31

jvarenina force-pushed the feature/GEODE-9854 branch from 3d3b68c to 5b3c352 Compare November 30, 2021 09:52

upthewaterspout reviewed Dec 1, 2021

View reviewed changes

DonalEvans approved these changes Dec 2, 2021

View reviewed changes

dschneider-pivotal requested changes Dec 3, 2021

View reviewed changes

geode-core/src/main/java/org/apache/geode/internal/cache/Oplog.java Outdated Show resolved Hide resolved

jvarenina and others added 2 commits December 3, 2021 15:45

jvarenina force-pushed the feature/GEODE-9854 branch from 5b3c352 to b0519f4 Compare December 3, 2021 14:47

jvarenina requested review from dschneider-pivotal and upthewaterspout December 3, 2021 14:53

dschneider-pivotal requested changes Dec 3, 2021

View reviewed changes

Updates after review

c45d52b

jvarenina requested a review from dschneider-pivotal December 6, 2021 14:31

dschneider-pivotal approved these changes Dec 7, 2021

View reviewed changes

nabarunnag approved these changes Dec 8, 2021

View reviewed changes

albertogpz reviewed Dec 9, 2021

View reviewed changes

Updates on test after review

169884d

jvarenina requested a review from albertogpz December 9, 2021 13:11

albertogpz approved these changes Dec 9, 2021

View reviewed changes

dschneider-pivotal merged commit 324ed89 into apache:develop Dec 9, 2021

jvarenina deleted the feature/GEODE-9854 branch December 10, 2021 13:53

jvarenina added a commit to Nordix/geode that referenced this pull request Dec 17, 2021

Revert "GEODE-9854: Orphaned .drf file causing memory leak (apache#7145…

c6f56bb

…)" For additional info please check https://issues.apache.org/jira/browse/GEODE-9903 This reverts commit 324ed89.

mkevo pushed a commit that referenced this pull request Jan 4, 2022

Revert "GEODE-9854: Orphaned .drf file causing memory leak (#7145)" (#…

87b1007

…7209) For additional info please check https://issues.apache.org/jira/browse/GEODE-9903 This reverts commit 324ed89.

jvarenina mentioned this pull request Jan 10, 2022

GEODE-9854: Orphaned .drf file causing memory leak #7234

Merged

6 tasks

jvarenina added a commit to Nordix/geode that referenced this pull request Mar 22, 2022

Revert "GEODE-9854: Orphaned .drf file causing memory leak (apache#7145…

eaae772

…)" This reverts commit f4c421b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GEODE-9854: Orphaned .drf file causing memory leak #7145

GEODE-9854: Orphaned .drf file causing memory leak #7145

jvarenina commented Nov 28, 2021 •

edited

upthewaterspout Dec 1, 2021

jvarenina Dec 2, 2021 •

edited

DonalEvans left a comment

dschneider-pivotal Dec 3, 2021

jvarenina Dec 6, 2021 •

edited

dschneider-pivotal Dec 6, 2021

jvarenina Dec 7, 2021

dschneider-pivotal Dec 7, 2021

jvarenina Dec 8, 2021

dschneider-pivotal commented Dec 9, 2021

GEODE-9854: Orphaned .drf file causing memory leak #7145

GEODE-9854: Orphaned .drf file causing memory leak #7145

Conversation

jvarenina commented Nov 28, 2021 • edited

For all changes:

upthewaterspout Dec 1, 2021

Choose a reason for hiding this comment

jvarenina Dec 2, 2021 • edited

Choose a reason for hiding this comment

DonalEvans left a comment

Choose a reason for hiding this comment

dschneider-pivotal Dec 3, 2021

Choose a reason for hiding this comment

jvarenina Dec 6, 2021 • edited

Choose a reason for hiding this comment

dschneider-pivotal Dec 6, 2021

Choose a reason for hiding this comment

jvarenina Dec 7, 2021

Choose a reason for hiding this comment

dschneider-pivotal Dec 7, 2021

Choose a reason for hiding this comment

jvarenina Dec 8, 2021

Choose a reason for hiding this comment

dschneider-pivotal commented Dec 9, 2021

jvarenina commented Nov 28, 2021 •

edited

jvarenina Dec 2, 2021 •

edited

jvarenina Dec 6, 2021 •

edited