Course detailHarvardEmerging / Needs Reviewopen

CSCI S-108

Data Mining, Discovery, and Exploration

Extracting actionable insights and relationships from massive complex data sets is the domain of data mining.

Data mining has wide-ranging applications in science and technology.

This course addresses several key aspects of data mining including the use of key-value pairs and hashing methods to manage and compute approximate analytics for massive scale datasets; highly scalable approximate similarity search and embedding algorithms for information retrieval, as used in retrieval-augmented generation (RAG) algorithms, web search, image search and recommendation systems; algorithms for ranking search and recommendation results; highly memory efficient sketch algorithms for infinite sized data, such as streaming data and online processing of massive datasets; unsupervised learning, including clustering models and dimensionality reduction algorithms for finding and exploring relationships in massive complex datasets; and graph representations and algorithms for search and social network analysis.

The course comprises readings and lectures on theory along with hands-on exercises and projects where students apply theory through Python coding and interpretation of results.

The hands-on component of the course uses a variety of libraries in the Python language, Scikit-Learn, NetworkX, FAISS, and deep-learning platforms and packages.

Students enrolled for graduate credit are required to perform, present, and report on an independent project.

This project must demonstrate a mastery of methods covered in the course as applied to a suitable rea-world data set.

Schedule note
TTh 6:30pm - 9:30pm Jun 21 to Aug 6

Help keep the register running.

Every cup of coffee fuels the sync workers and proxy rotations.

Buy me a coffee