Spark para Developers Requisitos: Familiarização prévia com linguagem Java/Scala/Python (também temos cursos dessas linguagens) Entendimento básico do ambiente de desenvolvimento Linux (command line navigation / editing files using VI or nano) Visão Geral: Programa dos Cursos: ro g Este é um curso de introdução ao Apache Spark, os participantes aprenderão como é que esse programa participa do ecossistema Big Data, e como utiliza-lo para analizar dados. O curso cobre Spark para analise de dados, internalidades do Spark, Spark APIs, Spark SQL, Spark Streaming, Machine Learning e graphX. N ob le P 1. Scala primer A quick introduction to Scala Labs : Getting know Scala 2. Spark Basics Background and history Spark and Hadoop Spark concepts and architecture Spark eco system (core, spark sql, mlib, streaming) Labs : Installing and running Spark 3. First Look at Spark Running Spark in local mode Spark web UI Spark shell Analyzing dataset – part 1 Inspecting RDDs Labs: Spark shell exploration 4. RDDs RDDs concepts Partitions RDD Operations / transformations RDD types Key-Value pair RDDs MapReduce on RDD Caching and persistence Labs : creating & inspecting RDDs; Caching RDDs 5. Spark API programming Introduction to Spark API / RDD API Submitting the first program to Spark Debugging / logging Configuration properties Labs : Programming in Spark API, Submitting jobs 6. Spark SQL SQL support in Spark Dataframes Defining tables and importing datasets Querying data frames using SQL Storage formats : JSON / Parquet NobleProg® Limited 2004 - 2016 All Rights Reserved +55 11 4349 3120 | [email protected] | bra.nobleprog.com Labs : Creating and querying data frames; evaluating data formats 7. MLlib 10. 11. 12. ro g 9. ob le P 8. MLlib intro MLlib algorithms Labs : Writing MLib applications GraphX GraphX library overview GraphX APIs Labs : Processing graph data using Spark Spark Streaming Streaming overview Evaluating Streaming platforms Streaming operations Sliding window operations Labs : Writing spark streaming applications Spark and Hadoop Hadoop Intro (HDFS / YARN) Hadoop + Spark architecture Running Spark on Hadoop YARN Processing HDFS files using Spark Spark Performance and Tuning Broadcast variables Accumulators Memory management & caching Spark Operations Deploying Spark in production Sample deployment templates Configurations Monitoring Troubleshooting N Duração: 21 hours Categorías dos Cursos: Big Data Apache Spark NobleProg® Limited 2004 - 2016 All Rights Reserved +55 11 4349 3120 | [email protected] | bra.nobleprog.com