-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Hadoop 3.x and Flink Yarn sessions #115
base: master
Are you sure you want to change the base?
Conversation
The scripts bin/{start,stop}-webclient.sh no longer exist in Flink release-1.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that's a lot of work. Thanks, Viktor!
@@ -118,7 +152,7 @@ object Model { | |||
* @param c The HOCON config to use when constructing the model. | |||
* @param prefix The prefix path which has to be rendered. | |||
*/ | |||
class Yaml(val c: Config, val prefix: String) extends util.HashMap[String, Object] with Model { | |||
class KeyValue(val c: Config, val prefix: String) extends util.HashMap[String, Object] with Model { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are using KeyValue
as a base class for only one concrete subclass (Yaml
), what is the purpose of factoring it out now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure anymore. It could just be that I wanted to distinguish different types of usages. For example, Model.Yaml is also used for log4j.properties which isn't really a Yaml file.
I added documentation for the INI format model. |
Let’s assume that Flink is setup with YARN and HDFS as dependencies. I.e., the dependency graph looks like this: flink-1.7.2 -> yarn-3.1.1 flink-1.7.2 -> hdfs-3.1.1 hdfs-3.1.1 -> () yarn-3.1.1 -> () Do determine the order in which dependencies are setup, the graph is reversed. hdfs-3.1.1 -> flink-1.7.2 yarn-3.1.1 -> flink-1.7.2 flink-1.7.2 -> () The graph is then traversed by starting with the nodes with in-degree > 0 and adding their dependencies to the list of nodes to visit. If DFS order is used, the following activation order is possible: hdfs-3.1.1, flink-1.7.2, yarn-3.1.1 That is because the traversal starts with hdfs-3.1.1 and then follows the edge to flink-1.7.2 before continuing with yarn-3.1.1. However, if a Flink YARN session is used, then Flink needs to connect to YARN at startup. Therefore, all dependencies of Flink have to be activated before it. This is achieved by traversing the graph in BFS order.
The key-value model is useful for any configuration file where keys are pre-defined and have to be present. Hadoop’s capacity-scheduler.xml is such a file. If it is empty, YARN will not start. However, if we use Model.Site to pre-define default values for the different configuration keys, we loose the helpful descriptions that are stored in the file. Actually, Model.Yaml should be similar to Model.Site and use the key-value pairs stored in the properties HashMap but with different syntax. The flink-conf.yaml file should be generated with Model.KeyValue to keep the helpful comments.
Unfortunately, this does not fix the warning that HADOOP_CONF_DIR or YARN_CONF_DIR are missing in the log file. The Flink documentation prefers to use environment variables over this configuration option.
Summary of changes: