Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
Spark is the most widely-used engine for scalable computing Thousands of companies, including 80% of the Fortune 500, use Apache Spark.
Key Features:
- Batch/Streaming Data
- SQL Analytics
- Data Science at Scale
- Machine Learning
There are some examples of how to use Spark with Python PySpark
- Spark basic examples: how to use Spark with Dataframes and Spark SQL
- Spark with Neo4j: How to use Spark for executing an ETL process
- Spark Writing Streaming with Pub/sub Lite: How to use Spark for sending a message to Pub/Sub Lite
- Spark Reading Streaming with Pub/sub Lite: How to use Spark for reading a message from Pub/Sub Lite
- Further, coming soon
Made with ❤ by jggomez.
Copyright 2023 Juan Guillermo Gómez
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.