020FOBDM2

Mining Massive Data Set

This course covers the fundamentals of designing dedicated software systems for big data analytics. The course begins with the principles of design of relational database systems for the analysis of business data, including declarative queries, query optimization and transaction management, as well as the evolution of basic systems of data to support complex analytical problems and scientific data management. The course then looks at fundamental architectural changes to the scale of data processing beyond the limit of a single computer, including parallel databases, "MapReduce", column storage and distributed key value, and allows the calculation of low latency analytical results from real-time data streams. Finally, this course examines advanced data management systems to support models of various data including tree structure (XML and JSON) and structured data graphics (RDF) and new workloads such as learning tasks. Automatic (Spark) and mixed workloads (Google Cloud data flow).


Temps présentiel : 35 heures


Charge de travail étudiant : 70 heures


Méthode(s) d'évaluation : Examen final