CSCI E-192
Modern Data Analytics
Data has become a critical asset of the modern era, shaping business decisions, scientific discovery, and everyday life.
Social media platforms, communications, financial and health care systems, web and application logs, and security analytics all depend on the ability to collect, process, and analyze massive volumes of data.
Demand for artificial intelligence (AI)-driven insights, generative AI, integration with large language models (LLMs), and predictive analytics make this functionality even more critical.
Modern cloud-based infrastructures provide the foundation for these capabilities.
This course introduces students to the architectural patterns, tools, and platforms that power contemporary data analytics systems.
Students learn main concepts and technologies involved in building full end-to-end data analytics solutions; examine architectural blueprints of large-scale data systems; explore the landscape of modern data-processing frameworks and services, including Spark, Apache Beam, Dataflow, Pub/Sub, Redshift, and BigQuery; understand the impact of AI and its integration and utilization in data systems; and apply these concepts in hands-on exercises and projects using Amazon Web Services (AWS) and Google Cloud Platform.
Topics include the fundamentals of machine learning and model deployment; design and organization of distributed data storage; principles of data lakes, data warehouses, lakehouses, and data-mesh architectures; integration of business intelligence (BI) tools for visualization; and the growing role of AI in modern data systems and tooling.
Python is used for assignments requiring programming.