developed by
Feng Li
School of Statistics and Mathematics
Central University of Finance and Economics
[email protected]
由中央财经大学统计与数学学院李丰建设。
-
Distributed Statistical Computing for Big Data and Case Studies (大数据分布式计算与案例) ISBN:9787300230276
- Available at JD.COM
-
New version (In Preparation)
You could view all the notebooks in this repository via the Jupyter Notebook Viewer
Requirements to run the notebook interactively
-
Python (>= 3.6.0)
findspark
(invoke Spark from Python Session)numpy
,scipy
,pandas
-
Hadoop (>= 2.7.0)
-
Hive (>= 2.3.3)
-
Spark (>= 2.3.1)
-
Jupyter Notebook (>= 5.0)
-
RISE (for Jupyter slides)
Use
Alt+R
to enter slideshow mode -
Bash Kernel (for Linux and Hadoop, Hive, Spark batch mode)
-
IPython kernel for Python 3 (for Interactive PySpark Sessions)
-
HiveQL Kernel (for Interactive Hive Sessions)
-
Spark Toree (for Interactive Spark Scala Sessions)
-