Course Outline

Propaganda
Spark para Developers
Requisitos:
Familiarização prévia com linguagem Java/Scala/Python (também temos cursos dessas linguagens)
Entendimento básico do ambiente de desenvolvimento Linux (command line navigation / editing
files using VI or nano)
Visão Geral:
Programa dos Cursos:
ro
g
Este é um curso de introdução ao Apache Spark, os participantes aprenderão como é que esse
programa participa do ecossistema Big Data, e como utiliza-lo para analizar dados. O curso cobre
Spark para analise de dados, internalidades do Spark, Spark APIs, Spark SQL, Spark Streaming,
Machine Learning e graphX.
N
ob
le
P
1. Scala primer
A quick introduction to Scala
Labs : Getting know Scala
2. Spark Basics
Background and history
Spark and Hadoop
Spark concepts and architecture
Spark eco system (core, spark sql, mlib, streaming)
Labs : Installing and running Spark
3. First Look at Spark
Running Spark in local mode
Spark web UI
Spark shell
Analyzing dataset – part 1
Inspecting RDDs
Labs: Spark shell exploration
4. RDDs
RDDs concepts
Partitions
RDD Operations / transformations
RDD types
Key-Value pair RDDs
MapReduce on RDD
Caching and persistence
Labs : creating & inspecting RDDs; Caching RDDs
5. Spark API programming
Introduction to Spark API / RDD API
Submitting the first program to Spark
Debugging / logging
Configuration properties
Labs : Programming in Spark API, Submitting jobs
6. Spark SQL
SQL support in Spark
Dataframes
Defining tables and importing datasets
Querying data frames using SQL
Storage formats : JSON / Parquet
NobleProg® Limited 2004 - 2016 All Rights Reserved
+55 11 4349 3120 | [email protected] | bra.nobleprog.com
Labs : Creating and querying data frames; evaluating data formats
7. MLlib
10.
11.
12.
ro
g
9.
ob
le
P
8.
MLlib intro
MLlib algorithms
Labs : Writing MLib applications
GraphX
GraphX library overview
GraphX APIs
Labs : Processing graph data using Spark
Spark Streaming
Streaming overview
Evaluating Streaming platforms
Streaming operations
Sliding window operations
Labs : Writing spark streaming applications
Spark and Hadoop
Hadoop Intro (HDFS / YARN)
Hadoop + Spark architecture
Running Spark on Hadoop YARN
Processing HDFS files using Spark
Spark Performance and Tuning
Broadcast variables
Accumulators
Memory management & caching
Spark Operations
Deploying Spark in production
Sample deployment templates
Configurations
Monitoring
Troubleshooting
N
Duração:
21 hours
Categorías dos Cursos:
Big Data
Apache Spark
NobleProg® Limited 2004 - 2016 All Rights Reserved
+55 11 4349 3120 | [email protected] | bra.nobleprog.com
Download