Page MenuHomePhabricator

SDS 1.2.1.A Hypothesis - gather AI Product use-cases, and identify external ML models to be tetsted
Closed, ResolvedPublic

Description

This task collects ongoing work on the SDS 1.2.1 hypothesis. The hypothesis text is defined as:

If we gather use cases from product and feature engineering managers around the use of AI in Wikimedia services for readers and contributors, we can determine if we should test and evaluate existing AI models for integration into product features, and if yes, generate a list of candidate models to test.

Work on this hypothesis will include:

  • Define and rank existing use-cases for AI integration into products based on conversation with product leades T370134
  • Make these use-cases more specific, so that we can choose the appropriate AI models to test T370134
  • Identify existing models to test for each use-case based on a set of criteria T370135
  • Final prioritization: select top use-cases to be tested based on product input, model/data availability, and measurability T370134#10069045

Event Timeline

Weekly update:

  • Drafted a planning doc gathering the steps needed to achieve the hypothesis, and a few high-level to-dos.
  • Finalized and categorized a list of AI use cases for product integration ["Proposed" sheet], sourced and arranged from several existing documents and discussions. Added ideas about potential impact and existing efforts. Performed an initial feasibility and scoping analysis based on the constraints of this hypothesis, and excluded a few use-cases based on technical/scientific constraints ["Not Feasable / Out of Scope" sheet], or because efforts are already ongoing as part of this AP ["Ongoing" sheet].
  • Drafted an initial set of questions we would like PMs to answer as they go through the candidate AI use-cases. Many to-dos there as I'd want to keep this process efficient and effective so that we can have a shortlist of prioritized AI use-cases relatively soon.
  • Set up a meeting with Research Scientists involved in WE1.2.4 and WE3.1.3 to discuss alignment and next steps (the 3 hypothesis have some overlapping areas)
Miriam renamed this task from SDS 1.2.1 Hypothesis - Q1 to SDS 1.2.1 Hypothesis - gather AI Product use-cases, and identify external ML models to be tetsted.Jul 10 2024, 3:09 PM
Miriam triaged this task as High priority.

Weekly update:

  • Started from the top of this doc to investigate potential external models to be used for each use-case.
  • Started investigating contractor resources to help with the hands-on part of this work for September/October
  • Met with Research Scientists leading WE1.2.4 and WE3.1.3 to discuss alignment and next steps (the 3 hypothesis have some overlapping areas). We concluded that there is alignment in at least 3 aspects, and that we will continue to meet every other week:
    1. Define criteria for model selection
    2. Decide whether to use ChatGPT as baseline/reference, and if we should request resources on that front
    3. Define what questions we would like PMs to answer, to have enough detail on the use-cases to be able to select the right models

Weekly update:

Weekly updates:

  • Started implementing the process for AI use-case prioritization. Conversations with Search, ML, and Language leaders.
  • Established alignment with and gave input to AI strategy T340693
  • Moved forward contractor processes, tentatively concluding and confirming fees and timelines by next week.

Weekly updates:

  • Survey input period has concluded. We received 13/16 responses from Product Managers. Thank you all for the contributions!
  • We analyzed the responses and ranked use-cases accordingly (see T370134)
  • Finalised contracts to support with the hypothesis. Contractor is already working on the model selection phase.

Weekly updates:

  • Advanced substantially on the model selection phase: identified promising externally-available AI models to test for around half of the use-cases.
  • For each use-case, evaluated the presence of existing data for evaluation in multiple languages (column N), and in absence of that, estimated the potential effort needed to collect new evaluation data (columns O and P).
  • Started conversations with Machine Learning about infrastructure constraints as to whether we would be able to host these models on LiftWing.

Is this also about users naming models that are useful? I know of an AI tool that is extremely useful in practice which integrates several models and for which I'd like to have some integration into Wikimedia Commons so things are go smoother, quicker, in standardized ways, and better than already demonstrated.

HI @Prototyperspective thanks for chiming in! Sure, any suggestion about AIs tool that could be useful for integration into Wikimedia projects are welcome. @Aroraakhil is looking at selecting the models to be tested.

Ok great. So I think there overall are three main ways AI can be used and while I'm likely missing some I think those are machine translation (one two), video dubbing + subtitles, and spoken Wikipedia.

Using text2image tools can also be useful for illustrating things for which there are no good free images such as digital art that illustrates the style and topics of an art genre and I don't know if some AI tool could be useful for creating charts and software development (not considering recommender systems for "Read more"/app feeds and Suggested Edits).

The tool I have in mind is SoniTranslate – please see the guide pages on creating spoken Wikipedia audios and dubbing videos + creating TimedTexts I linked above. Examples are included there and I think it's ready for use which the examples created with limited experience I think demonstrate. I also listed several issues that I identified and for which I'll create issues in the SoniTranslate software repo (it's open source) in these two pages. I think the models it uses can be found in its readme (under Credits and these two linked further up pyannote/speaker-diarization pyannote/segmentation). Let me know if you have any questions and things can also be asked at the Talk pages of the linked pages.

What I think is missing is some integrated tool that supports quick, easy, many-languages and high-quality creation of spoken Wikipedia audios, redubbed videos and transcribed subtitles. This could be some extension to SoniTranslate one installs locally and a gadget/WMC-script or (best option) a Web UI for registered users with a gadget where one can e.g. click "Redub this video" and it guides the user through the process in a similar to video2commons. I think there are few things that would be as useful when it comes to media in Wikimedia projects (like missing media in non-English Wikipedias or other-language media in such Wikipedia articles & Wikidata items).

Hi @Prototyperspective thank you so much for this message!

One note about this hypothesis is that its current scope is gathering and prioritizing AI requirements and use-cases from our Product teams. We just finished gathering a first set of ideas, and you can see a draft of the current list in the project's Meta page. We are now at the Model Selection stage, where we are doing literature review to find good existing AI models that could be a good match for the list of these use-cases.

Your suggestion is extremely helpful as one of the use-cases specifically related to your comment is "Text to speech". So we will look at the models behind the SoniTranslate tools to check what could be potentially used in support of that use-case.

Regarding the broader product/feature request about an integrated tool that supports quick creation of spoken Wikipedia audios, redubbed videos and transcribed subtitles, my suggestion is that you can submit a request via Wishlist, to make it more visibile to the rest of the community and our Product teams.

I hope that helps, thank you so much again for your comment!

@Miriam Thanks for these infos and for looking into it. I indeed found that text-2-speech applications are missing in that table. I don't know if it would make sense to deal with machine translation separately but if not that is also not yet included there despite being the one key main use-case for Wikipedia (and already widely applied).

At "is the model open-sourced (in some form" I think it may be good to also add something about the code – it's more important whether the code is free and open source than specifically whether or not it's on HuggingFace and the code could be in some GitHub repo.

Thanks, it would be great if you investigate the SoniTranslate models. Please let me know if you have any questions about it or if there is something else happening relating to it that is not an update of the page you linked which I'm following.

I have already submitted a Wishlist proposal about that with "A tool for auto-transcription to speed up the creation of TimedTexts subtitles for videos on Commons". I could create a separate one for dubbing videos into other languages but a) so far there's not much activity on that proposal where it's unclear how the Wishlist will be dealt with (previously people voted on things and there was a results table which largely does not get implemented but now activity seems mostly limited to low-level questions on the proposal's talk page and no change in regards to implementation of them) and b) that could be implemented as an extension of what I was proposing there...if that gets implemented one could adapt that solution to also make it easier for user to go through a guided video redubbing process. Moreover, I have many other technical ideas and want to cut down on how many I submit because there still is no change in regards to capacity for implementing wishes and because I have quite a number of them in my notes (most of these not as useful as the ones I did propose). Adding subtitles also seems like the first step to do for multilingualism. I was thinking of submitting a proposal about an export function for Wikipedia articles that can be used by text-2-speech and/or a Wikipedia article-to-audio tool, maybe I'll do that and I'm currently looking into browser extensions that could do so by setting some CSS that would already do so with a click. What's needed more is a proper audio player as described in the page where one can e.g. skip back 10 seconds or skip to a section so maybe I only submit that.

@Prototyperspective wanted to reply to your comment re: the wishlist.

Limited activity on your specific proposal is not necessarily a "bad" thing. Though voting on individual wishes produced some level of support and commentary, it had minimal impact on prioritization of specific wishes or action being taken. We do anticipate a lag between a wish being proposed and a wish being fulfilled, in part because we want to form associations between wishes and then understand the root of the problem to be solved.

The specific request is a real user painpoint, but given the limited technical resources to support commons, we need to ask real prioritization questions, informed by community feedback

leila renamed this task from SDS 1.2.1 Hypothesis - gather AI Product use-cases, and identify external ML models to be tetsted to SDS 1.2.1.A Hypothesis - gather AI Product use-cases, and identify external ML models to be tetsted.Oct 4 2024, 9:00 PM
leila updated the task description. (Show Details)

@Miriam thanks for your work and the team's work on this hypothesis and congrats on concluding it. As I have mentioned to you separately, I'm really happy to see how much we have learned as a result of this work over a span of a few months.

  • I updated the task title and added A per agreement per program management guidance and knowing you will have a follow-up hypothesis SDS 1.2.1 B (or potentially two).
  • I'm going to resolve this task as the work is done. If you want to copy/paste your final report for this hypothesis here, go ahead and do it. Otherwise, the task description and link to sub-tasks provides good pointers for the different components that you aimed to do.
  • I'm going to resolve this task as the strict hypothesis language did not require arriving at the prioritized list of applications. I know you have done that, and I know we will talk in a few days to finalize it. But that's not a blocker for closing this one imo.
  翻译: