Big Data course

CentraleDigitalLab@LaPlateforme


Stéphane Vialle, CentraleSupélec & LISN, Stephane.Vialle@centralesupelec.fr 

Gianluca Quercini, CentraleSupélec & LISN, Gianluca.Quercini@centralesupelec.fr


The main aim of this course is to provide an understanding of algorithms and programming in BigData paradigms (various variants of Map-Reduce, analysis of structured documents in NoSQL databases, etc.), as well as the technical principles underlying Big Data environments (distributed file systems, fault tolerance and load resistance through redundancy, etc.). A quantified presentation of the concept of Scaling completes the course, and practical exercises on Spark and MongoDB environments will illustrate the concepts seen in the course.
    1a - Introduction to MapReduce & Spark
    1 slide per page 6 slides per page
    1b - Spark application deployment (TP1 & TP2)
    1 slide per page 6 slides per page
    2 - Performance metrics & scalability
    1 slide per page 6 slides per page
    3a - NoSQL principles and emergence
    1 slide per page 6 slides per page
    3b - Distributed and NoSQL databases 1 slide per page 6 slides per page
    3c - Spark-SQL
    Documentation Spark-SQL
    3d - MongoDB : syntax and examples (TD3 & TP3) 1 slide per page 6 slides per page

    MongoDB introduction notebook

    MongoDB manual
    Tut1: Designing basic Map-Reduce algorithms in Spark
    Statement
    Tut2: Designing advanced Map-Reduce algorithms in Spark
    Statement
    Tut3: MongoDB syntax and examples
    See slides 3d
    Lab1 Part-0: Access to the DCE Spark cluster
                        Video of Spark cluster access via "vscode"
    Lab1 Part-1: Discovering HDFS and Spark commands
    Doc-dcejs-ssh-vscode
    Video-vscode
    Discovering-hdfs-spark
    Lab1 Part-2: Designing basic Map-Reduce algorithms in Spark Statement
    Lab2: Designing advanced Map-Reduce algorithms in Spark Statement
    Lab3 Part-1: Access to the DCE's MongoDB servers
    Lab3 Part-2: Getting started and querying a MongoDB database
    MongoDB server acess
    Statement
     
    Spark
    Exercise S1
    Exercise S2


    MongoDB
    Exercise M1




    NoSQL DBs:
    Hadoop & Map-Reduce:
    Spark :