Brief summary
Some Wikipedia editors work as a team through WikiProjects to monitor and improve the quality of articles in specific topic areas. How could anyone explore the expansion and quality improvement of WikiProjects related articles over time to assess the impact of this community effort?
Article quality is usually measured by editors using wiki-specific criteria. As keeping these assessments complete and up-to-date is largely impossible given the ever-changing nature of Wikipedia, the Wikimedia Foundation’s Research team has developed a model to predict the quality of Wikipedia articles. The model is based on language-agnostic features: page length, number of references, number of sections, number of wikilinks, number of media files, and number of categories. Using this model, a dataset with feature values and predicted quality scores for all revisions of all articles in more than 300 language editions of Wikipedia has just been published. Thus, this dataset provides key information on the expansion and quality of Wikipedia articles over time. In this project we will build on this novel data asset to develop a visualization tool that will allow anyone to explore the evolution of quality in articles maintained by WikiProjects.
Specifically, the work will the following phases:
- Becoming familiar with the dataset, i.e. how to pre-process the data, what is the data structure, what other features could be added (e.g. pageviews), etc.
- Selection of specific Wikiprojects of interest
- Analysis and visualization of language-agnostic features of revisions of articles from Wikiprojects (over time)
- Build an interactive web interface for users to explore the data visualizations
Skills required
- Python for data analysis, modeling and visualization
- HTML/CSS/JS and general design
- Jupyter notebooks for documentation (nice to have but willingness to learn is sufficient)
Possible mentor(s)
Microtasks
See T358095