020IDSES3 | Introduction to Data Science |
---|---|
![]() |
This course offers a practical introduction to the core concepts, tools, and workflows of data science. Students will explore the typical steps in a data scientist’s workflow, including data acquisition, wrangling, analysis, modeling, and visualization. Key programming tools such as NumPy and pandas are introduced for handling structured data, along with SQL (SQLite, pandasql) and APIs for data access. Students will learn data cleaning techniques such as type handling, outlier detection, partial deletion, and imputation. The course covers exploratory data analysis (EDA) using statistical significance tests and introduces both parametric and non-parametric testing methods, including t-tests, Welch’s test, the Shapiro-Wilk test, and the Mann–Whitney U test. Fundamental machine learning techniques are introduced, with a focus on linear regression, gradient descent, and model evaluation using the coefficient of determination. Students will also study the principles of effective data visualization, including visual cues, coordinate systems, scaling, data types, and contextualization, with practical plotting exercises using Python. Finally, the course offers an introduction to big data concepts, including the basics of MapReduce and Hadoop, and integrates hands-on experience using Jupyter Notebooks. Temps présentiel : 30 heures Charge de travail étudiant : 70 heures Méthode(s) d'évaluation : Examen final, Examen partiel, Travail personnel |