Repo meant for university course MT4007 - HT24 at Stockholm University.
Week 1: Installed essential tools (Git/GitHub, MiniConda, Jupyter, VS Code) and learned key Git concepts: forking (detached copy), cloning (connected copy), and branching (separate versions). Analysis correctness requires accurate, reproducible outputs.
Week 2: Introduced to dataframes for organizing and manipulating tabular data through operations like sorting, selecting, filtering, mutating, and grouping. Visualization emphasizes using plots effectively to interpret data, choosing appropriate visual types (e.g., line charts, bar plots) based on the analysis goal.
Week 3: Focused on Exploratory Data Analysis (EDA) and data processing. Emphasized the importance of systematically understanding data by asking and answering questions, transforming, and visualizing data. Discussed structuring data where each variable is a column, each observation is a row, and each value is a cell. Addressed data cleaning techniques, including handling missing data through imputation or removal and dealing with outliers by identifying and addressing anomalies.
Week 4: Got into data integration and pattern recognition techniques. Explored various types of data joins—left, right, inner, and outer—to merge datasets effectively. Introduced Structured Query Language (SQL) for database querying, covering basic syntax for data retrieval and storage. Discussed the use of Regular Expressions (RegEx) for pattern matching in strings, highlighting their application across tools like Git, Python/R, and SQL for efficient data processing.