Wednesday, September 28, 2022
HomeBig DataFine-Tune Fair to Capacity Scheduler in Relative Mode

Fine-Tune Fair to Capacity Scheduler in Relative Mode

[ad_1]

 

Cloudera Data Platform (CDP) unifies the technologies from Cloudera Enterprise Data Hub (CDH) and Hortonworks Data Platform (HDP). A few functionalities that existed in the legacy platforms (HDP and CDH) are substituted by other alternatives based on a detailed and careful analysis. CDH users would have used Fair Scheduler (FS), and HDP users would have used Capacity Scheduler (CS). After thoroughly analyzing the YARN schedulers available in the legacy platforms, Cloudera chose Capacity Scheduler as the supported YARN scheduler for CDP. We’ve now merged functionality between the two schedulers, minimizing the impact to CDH users going through this transition. 

In previous blog posts the Four Paths to CDP and Choosing your Upgrade or Migration Path, we covered the overall business and technical issues that go into moving your legacy platform to CDP. And in the CDH to CDP and HDP to CDP upgrade blog posts, we walked through the overall technical process of the upgrade and provided video demonstrations from each legacy distribution. In this blog we shift our focus to a specific area that should be given some special attention while upgrading or migrating from CDH to CDP.

To make upgrading from CDH to CDP easier, Cloudera provides the fs2cs conversion utility. This utility automatically converts certain Fair Scheduler configurations to Capacity Scheduler configurations, as part of the Upgrade Cluster Wizard in Cloudera Manager. Some of the features of Capacity Scheduler are unique and not mirrored in Fair Scheduler. Hence, the fs2cs conversion utility cannot convert every Fair Scheduler configuration into a corresponding Capacity Scheduler configuration. (Examples of such configurations are discussed in the later sections of this document.) After the fs2cs tool is used for the initial conversion of scheduler properties, some manual fine-tuning is required to ensure that the resulting scheduling configuration will fit into your organization’s internal resource allocation goals and workload SLAs. 

This blog lists certain configurations of Capacity Scheduler that require fine-tuning after upgrading to CDP in order to mimic some of the Fair Scheduler behavior from before the upgrade. This fine-tuning lets you match CDP Capacity Scheduler settings to some of the previously set thresholds in the Fair Scheduler. In CDP Private Cloud Base 7.1.6,  a new additional mode called “weight mode” is introduced to allocate resources to queues. This blog focuses on the older “relative mode” that is present in all versions of CDP Private Cloud Base, for allocation of resources to queues.

Cloudera fs2cs conversion utility

For detailed information about the fs2cs conversion utility, how it works internally, examples, and limitations, see this previous blog post by Cloudera.

For detailed instructions about the scheduler transition process including migrating the YARN settings from Fair Scheduler to Capacity Scheduler, see the Cloudera upgrade documentation.

Scheduler configurations: quick review

Fair Scheduler in CDH

  • A specified weight is used to calculate the amount of fair resources for each queue
  • Fair shares for all queues are recalculated each time a new queue is created
    • For more details on fair share calculations please refer to this blog
  • The value set for “maximum resources” configuration is a hard limit
  • The value set for “maximum running apps” configuration is a hard limit
  • FS does not allow you to set resource limits on individual users 
    • One user can use resources up to the maximum hard limit of the queue

Capacity Scheduler in HDP

  • Configured capacity is used to calculate the capacity of each queue
    •  Configured capacity of all child queues for each parent should sum up to 100%
  • Maximum capacity specified for each queue is a hard limit
  • Maximum applications configurable for each queue is a hard limit
  • CS provides options to control resource assignment to different users within a queue
  • “User limit factor” controls the maximum quantity of resources that a single user can consume within a queue
    • The value set for this configuration is a hard limit
    • Value of this configuration is set as a multiple of the queues’ configured capacity 
      • Value of 1 means the user can consume the entire configured capacity of the queue
      • Value greater than 1 allows the user to go beyond the configured capacity
      • Value less than 1 (such as 0.5) allows the user to obtain only that fraction of the configured capacity
    • For more information about the user limit factor, see setting user limits 
  • “Minimum user percentage” is the smallest quantity of resources a single user should get during a request

Scheduler comparison: from legacy platforms

The following table gives a quick side-by-side comparison of some of the features in Fair Scheduler in CDH and Capacity Scheduler in HDP.

Fair Scheduler (CDH)

Capacity Scheduler (HDP)

Weight based: automatic fair share calculation Percentage capacity based or absolute resource configuration based 
While adding a new queue, fair shares for all queues are recalculated dynamically While adding a new child queue, the capacity of sibling queues’ (if any) under the same parent would need to be reconfigured
Hard limits for queues

  • The value set for “max resources”
  • The value set for “max running apps” 
Hard limits for queues

  • “Maximum capacity” defined for each queue
  • “Maximum applications” configured for each queue 
No option to define resource limits among users within a queue The following configurations can be used to define resource assignment among users within a queue

  • “User limit factor” hard limit
  • “Min user percentage” soft limit

 

New features in Capacity Scheduler in CDP

Below are a few of the newly added features to Capacity Scheduler in CDP:

  • Capacity scheduler supports three modes of resource allocation in CDP:
    • Relative: based on percentages of total resources (same as HDP)
    • Absolute: based on absolute values for hardware attributes, such as memory or vCores
    • Weight: based on fractions of total resources (like weighted queues in CDH)

For more information about these resource allocation modes, check out our resource allocation overview.

  • Dynamic Queue Scheduling: Technical Preview in CDP Private Cloud Base 7.1.7
    • Created automatically at runtime
    • Restarting YARN service deletes all dynamically created queues
    • Based on the resource allocation mode, dynamic queues are managed differently.
    • See the Cloudera documentation for more information on dynamic queues

Example: using the fs2cs conversion utility

You can use the fs2cs conversion utility to automatically convert certain Fair Scheduler configurations to Capacity Scheduler configurations as a part of the Upgrade Cluster Wizard in Cloudera Manager. Refer to the official Cloudera documentation for usage details of fs2cs. This tool can also be used to generate a Capacity Scheduler configuration during a CDH-to-CDP side-car migration.

  1. Download the Fair Scheduler configuration files from the Cloudera Manager
  2. Use the fs2cs conversion utility to auto convert the structure of resource pools
  3. Upload the generated Capacity Scheduler configuration files to save the configuration in Cloudera Manager:

Fair Scheduler configurations from CDH: before upgrade

As an example, let’s consider the following dynamic resource pools configuration defined for Fair Scheduler in CDH. 

Capacity Scheduler in Relative Mode from CDP: after upgrade

As part of the upgrade to CDP, the fs2cs conversion utility converts the Fair Scheduler configurations to the corresponding Relative Mode in Capacity Scheduler. The following screenshots show the resulting Relative Mode Capacity Scheduler configurations in YARN Queue Manager.

Observations (in Relative Mode for CS)

  • All queues have their max capacity configured as 100% after the conversion using the fs2cs conversion utility.
    • In FS, some of the queues had “maximum resources” configured using absolute values and those were hard limits
    • Therefore, hard limits for queues based on “maximum resources” that were present in FS in CDH needs some fine-tuning after migration to CS in CDP
    • In CS the maximum capacity is based on the parent’s queue while in FS “maximum resources” is configured as a global limit
  • All queues have the user limit factor set to 1 (which is the default) after the conversion using the fs2cs conversion utility.
    • Setting this value to 1 means that one user can only use up to the configured capacity of the queue
    • If a single user needs to go beyond the configured capacity and utilize up to its maximum capacity, then this value needs to be adjusted
    • In CDH, many applications would have been using a single tenant (user ID) to run their jobs on the cluster. In those cases, the default setting of 1 for user limit factor could mean even if the cluster has available capacity, jobs go into a pending state.
  • Ordering policies within a specific queue.
    • Capacity Scheduler supports two job ordering policies within a specific queue, FIFO (First In, First Out) or Fair. Ordering policies are configured on a per-queue basis. The default ordering policy in Capacity Scheduler is FIFO for any new queue getting added. But for queues getting converted using fs2cs, the ordering policy would be set to “fair” if DRF was used as the scheduling policy in the corresponding Fair Scheduler configuration. To switch the ordering policy for a queue to “fair,” edit the queue properties in YARN Queue Manager and update the value for “yarn.scheduler.capacity.<queue-path>.ordering-policy.

Manual fine-tuning (in Relative Mode for CS)

As mentioned previously, there is no one-to-one mapping for all the Fair Scheduler and Capacity Scheduler configurations. A few manual configuration changes should be made in CDP Capacity Scheduler to simulate some of the CDH Fair Scheduler settings. For example, we can fine-tune the maximum capacity in the CDP Capacity Scheduler to set up some of the hard limits previously defined in CDH Fair Scheduler using the Max Resources. Also, in CDH there was no option to restrict resource consumption by individual users within a queue, so one user could consume the entire resources within a queue. In such a situation, tuning of the configuration for user limit factor in CDP Capacity Scheduler is required to allow individual users to go beyond the configured capacity and up to the maximum capacity of the queue.

We can use the calculations listed below as a starting point to fine-tune the CDP Capacity Scheduler in Relative Mode. This creates an environment with similar capacity limits for users that were previously defined in Fair Scheduler. 

The calculations are done using the settings defined in YARN as well as in CDH Fair Scheduler. 

  • Configured Capacity
    • Configured Capacity = Round([{Configured weight for this queue in Fair Scheduler} / {Total of all weights for all sibling queues} * 100]) to 2 digit
  • Max Capacity – If Maximum Resources are defined as absolute values for vCores and Memory in Fair Scheduler
    • Max Capacity = Round(max([{max vCores configured for this queue in Fair Scheduler} / {Total vCores for YARN} * 100], [{max memory configured for this queue in Fair Scheduler} / {Total memory for YARN} * 100]))to 2 digits
  • Max Capacity – If Maximum Resources are defined as a common percentage for vCores and Memory in Fair Scheduler
    • Max Capacity = Common Percentage defined for Max Resources for this queue in Fair Scheduler 
  • Max Capacity – If Maximum Resources are defined as separate percentages for vCores and Memory in Fair Scheduler
    • Max Capacity = Max(Percentage defined for Max Resources for vCores in Fair Scheduler for this queue, Percentage defined for Max Resources for memory in Fair Scheduler for this queue)
  • User Limit Factor
    • User Limit Factor = Round({calculated max capacity for this queue in Capacity Scheduler} / {configured capacity for this queue in Capacity Scheduler}) to 2 digits

​​Fine tuned scheduler comparison (in Relative Mode for CS) 

After upgrading to CDP, we can use the calculations suggested above along with the configurations previously present in CDH Fair Scheduler to fine-tune the CDP Capacity Scheduler. This fine-tuning effort simulates some of the previous CDH Fair Scheduler settings within the CDP Capacity Scheduler. If such a simulation is not required for your environment and use cases, discard this fine-tuning exercise. In such situations, an upgraded CDP environment with a new Capacity Scheduler presents an ideal environment to revisit and adjust some of the YARN queue resource allocations from scratch.

A side-by-side comparison of the CDH Fair Scheduler and fine-tuned CDP Capacity Scheduler used in the above example is provided below.

Summary

Capacity Scheduler is the default and supported YARN scheduler in CDP Private Cloud Base. When upgrading or migrating from CDH to CDP Private Cloud Base, the migration from Fair Scheduler to Capacity Scheduler is done automatically using the fs2cs conversion utility. From CDP Private Cloud Base 7.1.6 onwards, the fs2cs conversion utility converts into the new Weight Mode in Capacity Scheduler. In prior versions of CDP Private Cloud Base, the fs2cs utility converts to the Relative Mode in Capacity Scheduler. Because of the feature differences between Fair Scheduler and Capacity Scheduler, a direct one-to-one mapping of all configurations is not possible. In this blog, we presented some calculations that can be used as a starting point for the manual fine-tuning required to match CDP Capacity Scheduler settings in Relative Mode to some of the previously set thresholds in the Fair Scheduler. A similar fine-tuning for CDP Capacity Scheduler in Weight Mode will be covered in a follow-on blog post.

To learn more about Capacity Scheduler in CDP, here are some helpful resources: 

Comparison of Fair Scheduler with Capacity Scheduler

CDP Resource scheduling and management

Upgrade to CDP

[ad_2]

Source link

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments