youcubed Data Science Curriculum

I recently finished helping a small online high school create a new data science course, the foundation for which was Stanford’s youcubed Explorations in Data Science. It’s a snappy curriculum that is thoughtful and modern in its topic selection and a bit ragged at the edges of its resources. Its lesson layout is clear, its tools of choice are accessible and modifiable, and in the spirit of decades of statistics-oriented education, it helps students be wary of practitioners who lie and misrepresent either through thoughtlessness or malice. While I wrote this course to run in a format that relies heavily on self-directed work, wholly different from what youcubed anticipated, the curriculum was an excellent base that created ample opportunity for differentiation. By all accounts, students are loving the results.

The most important aspect of the curriculum is the project portfolio. I commend youcubed for their fascinating projects that cover different aspects of data science. They provide data when it makes sense, and otherwise provide opportunities for students to find or generate their data, then clean and analyze it.

The cleverest project involved analyzing skin tones in magazines. Students find magazines online and use a color picker that the program provides to sample the skin tones of models within the magazine. This lets students thoughtfully explore the bias inherent in parts of culture and how to handle it from a data science perspective. They learned how to move between categorical and quantitative skin tone measurements and the differing analyses that result.

This project also showcases the tools that youcubed creates to enable real interaction with Python programs without being overly focused on writing code from scratch. The color picker tool generates a Python dictionary that can be copied into a Google Colab notebook, the latter of which can be slightly modified depending on how students decide to analyze their data. The curriculum frequently takes this approach: provide a robust program and point students to elements that can be reasonably modified. They also use EduBlocks to teach basic programming logic, but that can be modified depending on the relative emphasis an instructor wants to have on these concepts versus broader data science investigations.

As the course concludes, students will have built a solid portfolio of projects spanning data types attached to relevant cultural and societal topics. There is the skin color bias, exploring music recommendation algorithms, and developing a score for determining what cities a student might enjoy. It’s a wonderful beginning for creating thoughtfully analytical citizens who can explore questions on their own, and also turn an appropriately skeptical eye to data they see tossed about in the wild.

Most importantly, as the basis for a course, the materials are incredibly flexible. A teacher strapped for time could definitely use everything provided as-is in a typical classroom environment and have a great time. I needed to adapt it to an instruction model that relies on higher student autonomy, but the explanations of each lesson made this shockingly simple. I was able to reference the slides and handouts as a starting point, then augment them to align with self-directed learning. There are activities that require instructor assistance, and I kept those in the synchronous instruction portions, while students could attack the individual and group work on their own.

Students are having fun in the class, the instructor has been able to tweak assignments based on how comfortable students are with the programming concepts, and the ethical discussions are apt and valuable for a high school class. It leaves me wanting to help them polish the raw materials to make this curriculum shine and be adopted by any high school with the bandwidth to add this as an elective course.

Leave a Reply