Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-9854

Orphaned .drf files causing memory leak

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.15.0
    • None

    Description

      Issue:
      An OpLog files are compacted, but the .drf file is left because it contains deletes ofentries in previous .crfs. The .crf file is deleted, but the orphaned .drf is not until all
      previous .crf files (.crfs with smaller id) are deleted.

      The problem is that compacted Oplog object representing orphaned .drf file holds a structure in memory (Oplog.regionMap) that contains information that is not useful
      after the compaction and it takes certain amount of memory. Besides, there is a race condition in the code when creating .krf files that, depending on the execution order,
      could make the problem more severe  (it could leave pendingKrfTags structure on the regionMap and this could take up a significant amount of memory). This
      pendingKrfTags HashMap is actually empty, but consumes memory because it was used previously and the size of the HashMap was not reduced after it is cleared.
      This race condition usually happens when new Oplog is rolled out and previous Oplog is immediately marked as eligible for compaction. Compaction and .krf creation start at
      the similar time and compactor cancels creation of .krf if it is executed first. The pendingKrfTags structure is usually cleared when .krf file is created, but sincecompaction canceled creation of .krf, the pendingKrfTags structure remain in memory until Oplog representing orphaned .drf file is deleted.

      Below it can be see that actually .krf is never created for the orphaned .drf Oplog object that has memory allocated in pendingKrfTags:

      server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1 <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#129 drf for disk store store1.
      server1.log:1958:[info 2021/11/25 21:52:26.867 CET server1 <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#129 crf for disk store store1.
      server1.log:1974:[info 2021/11/25 21:52:39.490 CET server1 <OplogCompactor store1 for oplog oplog#129> tid=0x5c] OplogCompactor for store1 compaction oplog id(s): oplog#129
      server1.log:1980:[info 2021/11/25 21:52:39.532 CET server1 <OplogCompactor store1 for oplog oplog#129> tid=0x5c] compaction did 3685 creates and updates in 41 ms
      server1.log:1982:[info 2021/11/25 21:52:39.532 CET server1 <Oplog Delete Task4> tid=0x5d] Deleted oplog#129 crf for disk store store1.
      

      Below you can see the log and heap dump of orphaned .drf Oplg that dont have pendingKrfTags allocated in memory. This is because pendingKrfTags is cleared when .krf is created as can be seen in below logs.

      server1.log:1976:[info 2021/11/25 21:52:39.491 CET server1 <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#130 drf for disk store store1.
      server1.log:1978:[info 2021/11/25 21:52:39.493 CET server1 <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#130 crf for disk store store1.
      server1.log:1998:[info 2021/11/25 21:52:41.131 CET server1 <Idle OplogCompactor> tid=0x5c] Created oplog#130 krf for disk store store1.
      server1.log:2000:[info 2021/11/25 21:52:41.893 CET server1 <OplogCompactor store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] OplogCompactor for store1 compaction oplog id(s): oplog#130
      server1.log:2002:[info 2021/11/25 21:52:41.958 CET server1 <OplogCompactor store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] compaction did 9918 creates and updates in 64 ms
      server1.log:2004:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete Task4> tid=0x5d] Deleted oplog#130 crf for disk store store1.
      server1.log:2006:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete Task4> tid=0x5d] Deleted oplog#130 krf for disk store store1.
      

      Attachments

        1. server1.log
          149 kB
          Jakov Varenina
        2. screenshot-1.png
          136 kB
          Jakov Varenina
        3. screenshot-2.png
          147 kB
          Jakov Varenina

        Issue Links

          Activity

            People

              jvarenina Jakov Varenina
              jvarenina Jakov Varenina
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: