Add support for Hadoop 3.x and Flink Yarn sessions #115

he-sk · 2019-11-19T16:45:36Z

Summary of changes:

Add support for Hadoop 3.x
Fix order when starting system dependencies
Add support for Flink Yarn sessions
Add support to generate INI type configuration files (e.g., Hadoop's container-executor.cfg)
Generate additional Hadoop configuration files to support accelerators

The scripts bin/{start,stop}-webclient.sh no longer exist in Flink release-1.0.0

aalexandrov

Wow, that's a lot of work. Thanks, Viktor!

aalexandrov · 2019-11-21T15:37:47Z

peel-core/src/main/scala/org/peelframework/core/config/Model.scala

@@ -118,7 +152,7 @@ object Model {
    * @param c The HOCON config to use when constructing the model.
    * @param prefix The prefix path which has to be rendered.
    */
-  class Yaml(val c: Config, val prefix: String) extends util.HashMap[String, Object] with Model {
+  class KeyValue(val c: Config, val prefix: String) extends util.HashMap[String, Object] with Model {


If we are using KeyValue as a base class for only one concrete subclass (Yaml), what is the purpose of factoring it out now?

Hmm, I'm not sure anymore. It could just be that I wanted to distinguish different types of usages. For example, Model.Yaml is also used for log4j.properties which isn't really a Yaml file.

he-sk · 2019-11-21T16:39:47Z

I added documentation for the INI format model.

Let’s assume that Flink is setup with YARN and HDFS as dependencies. I.e., the dependency graph looks like this: flink-1.7.2 -> yarn-3.1.1 flink-1.7.2 -> hdfs-3.1.1 hdfs-3.1.1 -> () yarn-3.1.1 -> () Do determine the order in which dependencies are setup, the graph is reversed. hdfs-3.1.1 -> flink-1.7.2 yarn-3.1.1 -> flink-1.7.2 flink-1.7.2 -> () The graph is then traversed by starting with the nodes with in-degree > 0 and adding their dependencies to the list of nodes to visit. If DFS order is used, the following activation order is possible: hdfs-3.1.1, flink-1.7.2, yarn-3.1.1 That is because the traversal starts with hdfs-3.1.1 and then follows the edge to flink-1.7.2 before continuing with yarn-3.1.1. However, if a Flink YARN session is used, then Flink needs to connect to YARN at startup. Therefore, all dependencies of Flink have to be activated before it. This is achieved by traversing the graph in BFS order.

The key-value model is useful for any configuration file where keys are pre-defined and have to be present. Hadoop’s capacity-scheduler.xml is such a file. If it is empty, YARN will not start. However, if we use Model.Site to pre-define default values for the different configuration keys, we loose the helpful descriptions that are stored in the file. Actually, Model.Yaml should be similar to Model.Site and use the key-value pairs stored in the properties HashMap but with different syntax. The flink-conf.yaml file should be generated with Model.KeyValue to keep the helpful comments.

Unfortunately, this does not fix the warning that HADOOP_CONF_DIR or YARN_CONF_DIR are missing in the log file. The Flink documentation prefers to use environment variables over this configuration option.

…files

…ssion

he-sk added 4 commits August 15, 2019 17:43

Fix path of stop-dfs.sh script

35857d6

Fixed wrong spelling

6eeb90a

Fix: Don’t start/stop Flink webclient

cae26ea

The scripts bin/{start,stop}-webclient.sh no longer exist in Flink release-1.0.0

Update registration message of task managers

539256c

he-sk changed the title ~~Flink yarn~~ Add support for Hadoop 3.x and Flink Yarn sessions Nov 19, 2019

aalexandrov approved these changes Nov 21, 2019

View reviewed changes

he-sk added 21 commits September 28, 2020 15:11

Added support for Hadoop 3.x

b37c58e

Added reference configuration for HDFS-3.1.1

b8e2d7b

Added support for Hadoop 3.x YARN

c7235f7

Added support for Flink 1.7.2

9871cf4

Introduce common base class for Flink systems

e3ffb26

Support for Flink YARN sessions

c90894f

Predefined beans for Hadoop 3.1.1 and Flink 1.7.2

6432d67

Generate capacity-scheduler.xml for YARN

3023869

Specify Hadoop configuration in flink-conf.yaml

dc9e800

Unfortunately, this does not fix the warning that HADOOP_CONF_DIR or YARN_CONF_DIR are missing in the log file. The Flink documentation prefers to use environment variables over this configuration option.

Bugfix: Definition of system.hadoop-3.config.slaves configuration option

3465887

Generate Flink configuration option: taskmanager.useAccelerators

b554da8

Generate INI-type configuration files

fd29d53

Generate container-executor.cfg and resource-types.xml configuration …

c1a8a22

…files

Add support for Flink 1.7.0

e964278

Fix missing class definition

031ae71

Add configuration defaults for Hadoop 3.x Yarn container-executor.cfg

abf5181

Add documentation for INI model

e62f417

Set number of task managers and task slots when running Flink YARN se…

f72d404

…ssion

Disambiguate HADOOP workers when running HDFS and YARN

8ef9476

he-sk force-pushed the flink-yarn branch from 2ceff8a to 8ef9476 Compare September 28, 2020 13:12

Merge branch 'master' into flink-yarn

bd8f301

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Hadoop 3.x and Flink Yarn sessions #115

Add support for Hadoop 3.x and Flink Yarn sessions #115

he-sk commented Nov 19, 2019

aalexandrov left a comment

aalexandrov Nov 21, 2019

he-sk Nov 21, 2019

he-sk commented Nov 21, 2019

Add support for Hadoop 3.x and Flink Yarn sessions #115

Are you sure you want to change the base?

Add support for Hadoop 3.x and Flink Yarn sessions #115

Conversation

he-sk commented Nov 19, 2019

aalexandrov left a comment

Choose a reason for hiding this comment

aalexandrov Nov 21, 2019

Choose a reason for hiding this comment

he-sk Nov 21, 2019

Choose a reason for hiding this comment

he-sk commented Nov 21, 2019