This course contains the use of artificial intelligence. AI tools were used to help produce input data and some visual materials, while all technical content, code, and teaching are entirely my own.
Are you stuck at pandas?
You know Python, you’ve used pandas — but the moment a project involves millions of rows or a job description mentions PySpark, things feel like a different world. A different mental model, a different syntax, and most tutorials don’t help. This course bridges that gap.
What you’ll build
Starting from raw CSV files, you’ll build a complete PySpark pipeline: clean and enrich the data, aggregate it across age groups, gender and app categories, compute a behavioral evolution index using window functions, and write production-ready Parquet output. Real dataset, real questions, real pipeline — something you could show in a technical interview tomorrow.





