Design Lab Awarded NSF Funds to Train New Graduate Students in Data-Centric Programming

Adapted from an article by Doug Ramsey

In September the Design Lab at UC San Diego launched a new project to help teach incoming graduate students how to program in the era of “big” data. The project is funded by the National Science Foundation (NSF) Innovations in Graduate Education (IGE) program, and the propsed Design Lab project is one of 10 new IGE grants awarded a total of $4.8 million to “pilot, test and validate innovative and potentially transformative ways to teach science, technology, engineering and mathematics (STEM).”

The UC San Diego team will receive approximately $500,000 over three years to develop a new data-science teaching approach via “Augmenting, Piloting and Scaling Computational Notebooks to Train New Graduate Researchers in Data-Centric Programming.”

The project's Principal Investigator (PI) and Design Lab co-founder is James Hollan, a Distinguished Professor of Cognitive Science with an adjunct appointment in Computer Science and Engineering (CSE). Hollan leads a team including co-PIs Professors Scott Klemmer, Philip Guo and Bradley Voytek. Design Lab co-founder Klemmer has a joint faculty appointment in Cognitive Science and CSE, while Guo and Voytek are professors of Cognitive Science (in Voytek’s case, with an appointment in the Neuroscience Graduate Program as well).

PI James Hollan, Co-PIs Scott Klemmer, Philip Guo, and Bradley Voytek

Project leaders (l-r): PI James Hollan, Co-PIs Scott Klemmer, Philip Guo, and Bradley Voytek.

“Virtually all graduate STEM training programs are currently confronting challenges to ensure their students have the computational skills required to function in increasingly data-intensive research domains,” observed Hollan in the proposal. “One singularly important challenge in the current era of big data is the growing need to train new graduate students in the programming and data analysis skills needed to be able to manage and exploit large-scale data in virtually every domain.”

The Design Lab team proposed to take the popular concept of introductory “bootcamps” for new graduate students, and to scale that approach while exploiting the growing movement of computational notebooks. Specifically, the researchers propose to augment the Jupyter Notebook, a widely used open-source web application, with other pedagogical tools – many of which they have developed and tested - to support training in data-centric programming in a wide range of STEM disciplines.

Scott Klemmer

Scott Klemmer records Interactive Design MOOC for Coursera.


“This has the potential to improve the efficacy of training graduate students in data-centric programming,” said Klemmer. “But the impact could be much greater in the long run because all of the new capabilities can be harnessed for teaching in other domains, and the open-source nature of the notebooks and tools will ensure that the technology will be widely available via the Web.”

In producing an open-source version of the Jupyter Notebook for teaching data-centric programming, the UC San Diego researchers plan to develop augmented notebooks by integrating other tools that have been widely used, particularly for massive open online courses (MOOCs). For example, Co-PI Klemmer helped develop Talkabout and PeerStudio. “Both systems have been used by tens of thousands of students in dozens of MOOCs on the Coursera online education platform over the past four years,” said Klemmer. Indeed, students and other learners taking Klemmer’s widely-watched “Interactive Design” courses on Coursera already have access to the software tools to provide feedback (PeerStudio) and to enable discussion among widely-distributed course participants (Talkabout).

Philip Guo

Cognitive Science professor Philip Guo 


Co-PI Philip Guo developed Python Tutor ( for tutoring programming support. It has been available for seven years, and in that time, over 3.5 million people in over 180 countries have used Python Tutor to visualize over 30 million pieces of code, either directly online or via Python Tutor’s integration into MOOCs from edX, Coursera and Udacity. People of all ages use Guo’s website to write code in several languages (Python, Java, C, C++, JavaScript, TypeScript, and Ruby). Critically, learners interact with automatically-generated visualizations to help them build mental models and debug their code.

PI Hollan has a long history of developing analysis tools – notably, for present purposes Traces and ChronoViz – to support the education of graduate students at scale. Both are software tools widely used in analyzing video of real-world activity. ChronoViz, for example, is an elegant software system for annotating, visualizing, navigating, and analyzing multimodal time-coded data, with proven potential to increase the efficiency of observational (ethnographic) research. Traces is an unobtrusive system for capturing desktop activity, which has a real chance at becoming a computational curative to the deleterious consequences of interruptions!

Distinguished Professor Hollan is both understated and modest – as a recipient of a CHI Lifetime Research Award for a lifetime of innovation and leadership when he says “Our team has deep experience in implementing, deploying and maintaining these tools over extended periods of time.”

Co-PI Bradley Voytek, a professor of Computational Cognitive Science and Neuroscience, has been teaching Introduction to Data Science (COGS 9) since 2014, when he joined the UC San Diego  Cognitive Science faculty.

Bradley Voytek

Bradley Voytek already uses Jupyter Notebook in his undergraduate data-science courses.


He notes that COGS 9 class size has ballooned from 24 students that first quarter, to 280 students in the latest quarter.  In spring 2017, Voytek launched his first upper division data science course, Data Science in Practice (COGS 108), with approximately 420 students from multiple disciplines. 

Voytek’s classes have used the Jupyter Notebook (for homework assignments), and he will employ and iteratively test versions of the variously augmented Jupyter notebook to assess its utility in university classrooms, in comparison with its use in MOOCs and other distributed learning environments. The resulting system will be available online through a GitHub repository. “This will enable it to be widely shared, evolved and tailored to specific discipline requirements,” said Hollan.

According to NSF, all 10 new projects evaluate approaches that could be scaled for use at other institutions nationally. These include: career peer-mentoring, gender-based case studies, faculty and student learning communities, revamped gateway courses, community and family engagement, and digital platforms for real-time feedback.

In addition to UC San Diego, IGE grants in the latest round were awarded to University of Arizona, University of Chicago, University of Arkansas, Montana State, Ohio State, College of William and Mary, Stony Brook University, SUNY Buffalo and Cal Poly, and Georgia Tech.

NSF Award #1735234 
Design Lab at UC San Diego