From 74f3061d88ce3155daca921c23266801533e7710 Mon Sep 17 00:00:00 2001 From: James Xu Date: Mon, 13 May 2024 13:38:28 +0800 Subject: [PATCH] [GLUTEN-5708][VL] Minor wording polishing for NewToGluten.md (#5707) --- docs/developers/NewToGluten.md | 62 +++++++++++++++------------------- 1 file changed, 28 insertions(+), 34 deletions(-) diff --git a/docs/developers/NewToGluten.md b/docs/developers/NewToGluten.md index 1eb21d1e6c05..a397003adf36 100644 --- a/docs/developers/NewToGluten.md +++ b/docs/developers/NewToGluten.md @@ -6,22 +6,20 @@ parent: Developer Overview --- Help users to debug and test with gluten. -For intel internal developer, you could refer to internal wiki [New Employee Guide](https://wiki.ith.intel.com/display/HPDA/New+Employee+Guide) to get more information such as proxy settings, -Gluten has cpp code and java/scala code, we can use some useful IDE to read and debug. - # Environment Now gluten supports Ubuntu20.04, Ubuntu22.04, centos8, centos7 and macOS. -## Openjdk8 +## OpenJDK 8 -### Environment setting +### Environment Setting -For root user, the environment variables file is `/etc/profile`, it will make effect for all the users. +For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. For other user, you can set in `~/.bashrc`. -### Guide for ubuntu +### Guide for Ubuntu + The default JDK version in ubuntu is java11, we need to set to java8. ```bash @@ -43,9 +41,9 @@ export PATH="$PATH:$JAVA_HOME/bin" > Must set PATH with double quote in ubuntu. -## Openjdk17 +## OpenJDK 17 -By defaults, Gluten compiles package using JDK8. Add maven profile `-Pjava-17` changing to use JDK17, and please make sure your JAVA_HOME points to jdk17. +By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17, and please make sure your JAVA_HOME points to jdk17. Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). So please add following configs in `spark-defaults.conf`: @@ -78,31 +76,20 @@ If you need to debug the tests in /gluten-ut, You need to compile java c # Java/scala code development with Intellij -## Linux intellij local debug +## Linux IntelliJ local debug -Install the linux intellij version, and debug code locally. +Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. - If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) -- Download [intellij linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to linux server +- Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server - Start Idea, `bash /idea.sh` -Notes: Sometimes, your desktop may stop accidently, left idea running. - -```bash -root@xx2:~bash idea-IC-221.5787.30/bin/idea.sh -Already running -root@xx2:~ps ux | grep intellij -root@xx2:kill -9 -``` - -And then restart idea. +## Windows/macOS IntelliJ remote debug -## Windows/Mac intellij remote debug - -If you have Ultimate intellij, you can try to debug remotely. +If you have IntelliJ Ultimate Edition, you can debug Gluten code remotely. ## Set up gluten project @@ -113,8 +100,8 @@ If you have Ultimate intellij, you can try to debug remotely. ## Java/Scala code style -Intellij IDE supports importing settings for Java/Scala code style. You can import [intellij-codestyle.xml](../../dev/intellij-codestyle.xml) to your IDE. -See [Intellij guide](https://www.jetbrains.com/help/idea/configuring-code-style.html#import-code-style). +IntelliJ supports importing settings for Java/Scala code style. You can import [intellij-codestyle.xml](../../dev/intellij-codestyle.xml) to your IDE. +See [IntelliJ guide](https://www.jetbrains.com/help/idea/configuring-code-style.html#import-code-style). To generate a fix for Java/Scala code style, you can run one or more of the below commands according to the code modules involved in your PR. @@ -161,7 +148,7 @@ VSCode support 2 ways to set user setting. ### Build by vscode -VSCode will try to compile the debug version in /build. +VSCode will try to compile using debug mode in /build. And we need to compile velox debug mode before, if you have compiled velox release mode, you just need to do. ```bash @@ -259,14 +246,15 @@ Then you can create breakpoint and debug in `Run and Debug` section. ### Velox debug -For some velox tests such as `ParquetReaderTest`, tests need to read the parquet file in `/velox/dwio/parquet/tests/examples`, you should let the screen on `ParquetReaderTest.cpp`, then click `Start Debuging`, otherwise you will raise No such file or directory exception +For some velox tests such as `ParquetReaderTest`, tests need to read the parquet file in `/velox/dwio/parquet/tests/examples`, +you should let the screen on `ParquetReaderTest.cpp`, then click `Start Debuging`, otherwise `No such file or directory` exception will be raised. -## Usefule notes +## Useful notes -### Upgrade vscode +### Do not upgrade vscode No need to upgrade vscode version, if upgraded, will download linux server again, switch update mode to off -Search `update` in Manage->Settings to turn off update mode +Search `update` in Manage->Settings to turn off update mode. ### Colour setting @@ -299,7 +287,7 @@ Set config in `settings.json` If exists multiple clang-format version, formatOnSave may not take effect, specify the default formatter Search `default formatter` in `Settings`, select Clang-Format. -If your formatOnSave still make no effect, you can use shortcut `SHIFT+ALT+F` to format one file mannually. +If your formatOnSave still make no effect, you can use shortcut `SHIFT+ALT+F` to format one file manually. # Debug cpp code with coredump @@ -370,7 +358,9 @@ wait to attach.... ``` # Debug Memory leak + ## Arrow memory allocator leak + If you receive error message like ```bash @@ -378,6 +368,7 @@ If you receive error message like 24/04/18 08:15:38 WARN ArrowBufferAllocators$ArrowBufferAllocatorManager: Leaked allocator stack Allocator(ROOT) 0/191/319/9223372036854775807 (res/actual/peak/limit) ``` You can open the Arrow allocator debug config by add VP option `-Darrow.memory.debug.allocator=true`, then you can get more details like + ```bash child allocators: 0 ledgers: 7 @@ -403,9 +394,12 @@ child allocators: 0 at org.apache.gluten.utils.IteratorCompleter.hasNext(Iterators.scala:69) at org.apache.spark.memory.SparkMemoryUtil$UnsafeItr.hasNext(SparkMemoryUtil.scala:246) ``` + ## CPP code memory leak + Sometimes you cannot get the coredump symbols, if you debug memory leak, you can write googletest to use valgrind to detect -``` + +```bash apt install valgrind valgrind --leak-check=yes ./exec_backend_test ```