Machine Learning based approaches have become ubiquitous in many areas of society, industry and academia. Understanding what Machine Learning (ML) is providing and reproducing what it infers, has become an essential prerequisite for adoption. In this line of thought, course materials, introductory media and lecture series of a broad variety, depth, and quality are public availability. To this date and the best our knowledge, there is no structured approach to collect and discuss best practices in teaching Machine Learning. This workshop strives to change this.
With our workshop, we want to perpetuate an academic discussion on best practices. We would like to help improve existing material as a community and make conceiving new material more effective. We are very happy that this idea was approved for ECML PKDD 2021 and ECML PKDD 2020 workshop programme. We hope to continue this in 2022.
The workshop programme is split into two parts:
This event will take place on September 13, 2022, starting 2:45 pm CEST. The times and presenters are indicated in the table below. All details to participate are available on the event note pad, e.g. the zoom connection details.
Note: To participate in the satellite event, no ECML ticket is needed.
|2:45 pm||Welcome and Housekeeping||The Organizers|
|3:00 pm||MOOC Machine learning in python with scikit-learn||David Arturo Amor Quiroz|
|4:00 pm||Teaching in the Open: advancing education by adopting open source and open science practices||Lorena Barba|
The MOOC “Machine learning in Python with scikit-learn” first aired in the spring 2021 and since then it has had two sessions with an average of 11,500 registered participants per session. In this talk we are going to discuss how the material can be used by teachers and students, as well as the technical and pedagogical choices and the general experience that the team have gained through the first 2 sessions.
The course is free of charge, requires no installation, includes final attestation and a discussion forum where the scikit-learn core developers were answering student’s questions. It offers a hands-on course with 7 modules (+ 1 introductory module), 15 video lessons, 70 programming notebooks, 26 quizzes, 7 wrap-up quizzes and 21 non-graded exercises. A static version of the course material is available on JupyterBook and the code can be found on GitHub where everybody can contribute.
David Arturo Amor Quiroz did his PhD in physics at the Institute of Nuclear Sciences of UNAM, Mexico (2014-2018). He is currently working at the National Institute for Research in Digital Science and Technology (INRIA), France, as part of the maintenance team of the Machine Learning library called scikit-learn.
Subjects like machine learning and data science are changing very fast. It’s thus a challenge for any particular department or college to teach high-quality and up-to-date courses on these topics. By looking to adopt ethos and processes from open source software and open science, we can enhance quality and outcomes in teaching new subjects. Ethos means the practices and values that characterize open source communities. The open education movement starting in the 1990s was inspired by open source software; its most visible efforts have been open courseware (OCW) and open educational resources (OER). But key features were missed: the open development model, community building, and networked collaboration. Teaching in the Open means looking to form collaborations in the development of curricula and content, sharing learning objects under permissive licenses, thinking of reuse from the beginning, participating in peer review of content and learning objects, and accepting community contributions. In this vein, we founded The Journal of Open Source Education (https://jose.theoj.org), publishing papers on both software for educational purposes, and learning modules, particularly on computing-based courses. The contributions of authors, editors, and reviewers show that communities of teacher-scholars are forming, growing, and having impact.
Lorena A. Barba is professor of mechanical and aerospace engineering at the George Washington University in Washington, DC. An international leader in computational science and engineering, she is also a long-standing advocate of open source software for science and education, and she is well known for her courses and open educational resources. She was a recipient of the 2016 Leamer-Rosenthal Award for Open Social Sciences, and in 2017, was nominated and received an honorable mention in the Open Education Awards for Excellence of the Open Education Consortium. Barba served (2014–2021) in the Board of Directors for NumFOCUS, a 501(c)3 public charity in the United States that supports and promotes world-class, innovative, open-source scientific software. She is an expert in research reproducibility, and was a member of the National Academies study committee on Reproducibility and Replicability in Science, which released its report in 2019. She served as Reproducibility Chair for the SC19 (Supercomputing) Conference, is Editor-in-Chief and track editor for Reproducible Research in IEEE’s Computing in Science & Engineering, was founder and Associate Editor-in-Chief for the Journal of Open Source Software, and is Editor-in-Chief of The Journal of Open Source Education. She was General Chair of the global JupyterCon 2020 and was named Jupyter Distinguished Contributor in 2020.
As we have a strong European and US community involved, we try to reflect this in our workshop setup on September 23, 2022, in Grenoble (France).
Our workshop will take place at the conference (European community, hybrid setup) and online-only after this event (US and European community). We start with an hybrid workshop in Grenoble from 2:30 pm CEST to 6:30 pm CEST at ECML’22 in Grenoble.
For every paper, the authors need to provide a poster (or similar) and a short video introducing the paper. The videos will be posted along with the paper link on our website. Every participant will get an email with a random selection of 3 videos to watch, before the workshop.
The key contents of the workshop discussion will find their way into a summary paper published along all papers of the workshop on PMLR.
Many experts and practitioners who develop Machine Learning models or infrastructure around these models are confronted with the opportunity to teach Machine Learning at some point in their career. Traditionally, many rely on their gut feeling to design courses that are motivated by these circumstances. The methods of choice are often PowerPoint or similar technologies.
This workshop targets those who would like to know, how teachers from around the globe approach teaching Machine Learning: How deep do they dive into the matter? What mental models do they use to visualize concepts? What media is at play in teaching ML by others?
With this workshop, we hope that all participants obtain a better feeling where they stand with their teaching and how they can improve or collaborate with others.
The main goal of this workshop is to motivate and nourish best practices at any stage of the teaching process. For this, we would like to cover a structured approach to teaching motivated by the carpentries or a variation thereof. We believe that the core concepts contained in this are helpful for any teaching practitioners.
The central activity of the workshop will be twofold:
a call-for-papers whereby teaching professionals or beginners are asked to describe their method of choice when teaching a given ML topic. We like to attract at maximum 4-page long mini-articles (excluding references and acknowledgements) that present or discuss a teaching activity related to machine learning. For more details, see below.
accepted papers will be shared in a community connection session, where presenters and participants can discuss the papers. (This is like a poster session but without posters)
The maximum length of papers is 4 pages (excluding references and acknowledgements) in this format. The program chairs reserve the right to desk reject any over-length papers without review. Papers that ‘cheat’ the page limit by, including but not limited to, using smaller than specified margins or font sizes will also be treated as over-length. Note that for example negative
\vspaces are also not allowed.
Additional materials (e.g. proofs, audio, images, video, data, or source code) can be provided as URLs inside the paper of your submission. The reviewers and the program committee reserve the right to judge the paper solely based on the 4 pages; looking at any additional material is at the discretion of the reviewers and is not required. In order not to undisclose the submitting author’s identity, please consider using tools like anonymous.4open.science.
We strive to pursue a double-blind review process. All papers need to be ‘best-effort’ anonymized. We strongly encourage to also make code and data available anonymously (e.g., in an anonymous git repository or Dropbox folder). It is allowed to have a (non-anonymous) pre-print online, but it should not be cited in the submitted paper to preserve anonymity. Reviewers will be asked not to search for them.
For past content accepted at our workshop, please see the proceedings of 2021 and 2020. We are open to any submission aligned with the goals of our workshop. In 2022, we cordially encourage authors to focus on
We will conduct an open double-blinded peer-review using openreview.net on all contributions and select contributions based on the reviewers’ feedback. Here are the important dates:
Each submitted paper will be reviewed publicly by at least two experienced machine learning instructors.
by Ludwig Bothmann, Sven Strickroth, Giuseppe Casalicchio, David Rügamer, Marius Lindauer, Fabian Scheipl, Bernd Bischl
by Gero Szepannel, Laurens Tetzlaff, Alexander Frahm, Karsten Lübke
by Gulustan Dogan
by Lukas Lodes, Alexander Schiendorfer
by Stefan Kesselheim, Jan Ebert, Danimir T Doncevic
by Ken Hasselmann, Quentin Lurkin
by Florian Huber, Dafne Erica van Kuppevelt, Peter Steinbach, Colin Sauze, Yang Liu, Berend Weel
by Tilman Michaeli, Stefan Seegerer, Lennard Kerber, Ralf Romeike
by Donatella Cea, Helene Hoffmann, Marie Piraud
by Matias Valdenegro-Toro, Matthia Sabatelli
We are extremely grateful for the group of volunteers that make this event happen by providing their reviews to submitted papers in the last years. We hope to attract reviewers again this year. Should you be interested, please let us know and contact us as indicated below.
Clare Boothe Luce Assistant Professor Department of Computer Science and Program in Statistical & Data Sciences Smith College
Team Lead AI Consultants for Matter Research at Helmholtz-Zentrum Dresden-Rossendorf
Research fellow at the HTW Dresden in the department of artificial intelligence.