To be Reviewed By: Geode Dev

Authors: Jinmei Liao

Status: Draft | Discussion | Active | Dropped | Superseded

Superseded by: N/A

Related: N/A

Problem

Each locator will have an instance of CMS running. Multiple invocations of CRUD operation may happen at the same. We need to make sure that these calls are done in a serial fashion, not intertwined. 

Anti-Goals

N/A

Solution

Each CMS needs to have a dlock service, when CRUD operation is started, needs to obtain a dedicated dlock to proceed.

At service initialization time, needs to create this dlock service:

private static DistributedLockService getCMSLockService(InternalDistributedSystem ds) {
    DistributedLockService cmsLockService = DLockService.getServiceNamed(CMS_NAME);
    try {
      if (cmsLockService == null) {
        cmsLockService = DLockService.create(CMS_NAME, ds, true, true);
      }
    } catch (IllegalArgumentException ignore) {
      return DLockService.getServiceNamed(CMS_NAME);
    }
    return cmsLockService;
}


And then, in the beginning/end of each CRUD operation, do a set of lock/unlock

public boolean lockCMS() {
   return cmsLockService.lock(CMS_NAME, -1, -1);
}

public void unlockCMS() {
   cmsLockService.unlock(CMS_NAME);
}

Changes and Additions to Public Interfaces

N/A

Performance Impact

There would some slight performance impact since now every CRUD operation through CMS will be serialized, even if they are initiated on different locators. But since these operations are not frequent in the system (these operations are operations that changes the cluster configuration, like create/delete regions/indexes etc, the performance impact can be tolerated.

Backwards Compatibility and Upgrade Path

Will the regular rolling upgrade process work with these changes? Yes

How do the proposed changes impact backwards-compatibility? Are message or file formats changing? No

Is there a need for a deprecation process to provide an upgrade path to users who will need to adjust their applications? No

Prior Art

What would be the alternatives to the proposed solution? What would happen if we don’t solve the problem? Why should this proposal be preferred?

FAQ

Answers to questions you’ve commonly been asked after requesting comments for this proposal.

Errata

What are minor adjustments that had to be made to the proposal since it was approved?

  • No labels

3 Comments

  1. Could some performance impact be mitigated somewhat by having separate locks for different areas of the cluster config? Creating/deleting a region should have no impact on changing the configured MethodInvocationAuthorizer, and vice versa, for example, so it should be safe to perform both operations in parallel.

    1. Good point. Donal.


      With a CRUD operation, CMS knows the following:

      1. the type of the configuration
      2. the ID of the configuration

      We could either mitigate the performance impact by having separate locks either for different types  or different IDs or even both. If we have dlocks per type, that means all region CRUD operations will be serialized, but an index operation can be in parallel with a region operations.  If we have dlocks per ID, that means say a region named "foo" will be created in parallel with another region named "bar", but an index named "foo" will have to wait until the region named "foo" is done. Or we can combine both. It all depends on how much mitigation we want to have.

  2. Having one common dlock for all CMS operations

    Pro:

    1. ease of implementation/understanding
    2. gfsh can also use this lock to lock in the gfsh command level


    Con:

    1. all operations will be executed in serial.


    Having a dlock per type or ID for CMS Operations:

    Pro:

    1. allow some level of concurrency

    Con:

          2. a bit harder if we want to have gfsh to participate in the locking