🗺 Et voilà, a map (or at least an attempt at a map) of the data governance requirements, as per Article 10 of the AI Act. (Because, obviously, no one has already thought about ensuring their system doesn’t turn into Skynet 2.0 by implementing data quality and performance metrics.)
💡 Article 10 basically consolidates the requirements regarding the use of data for training, validation, testing and to an extent deployment of AI systems. Groundbreaking? Hardly. However, due to the somewhat arbitrarily ordered list of requirements, the real challenge is in determining what should be done, when, and by whom. Hence, my attempt to bring some order to this chaos, trying not to lose my sanity in the process.
Important points:
👉 All of these are the obligations of a provider of a high-risk AI system.
👉 The deployers are still responsible for system input data (presuming they have control over it).
👉 Not listed in Article 10 but still relevant: the obligation to document a description of the training methodologies, techniques as well as the training data sets used in the form of datasheets. (Annex IV - technical documentation according to Article 11 (1)).
👉 Specifications for the input data, or any other relevant information regarding training, validation and testing data sets, have to be communicated to deployers as part of the 'Instructions for use' (Article 13(3)).
👉 All potential headaches can be avoided by relying on third parties that offer certified compliance services for verifying data governance, data set integrity, and data training, validation and testing practices. (Recital 67) (I'm highly sceptical of the general effectiveness of any such certification scheme - nothing screams 'trust' like a stamp of approval on 'appropriateness'.)
👉 And, in a twist that should surprise absolutely no one, every other conceivably applicable data relevant law and regulation applies. (Most of these are (in my opinion especially) relevant in the 'Data collection' phase.)
Personal thoughts:
👉 This is an (extremely) overly simplified map that completely disregards the fact that e.g. bias or functional appropriateness can hardly be assessed separately from the algorithm or system as a whole. (I still think it's better than nothing.)
👉 The data governance article kind of ignores the fact that most problems can hardly be solved by adjusting (manipulating?) the data.
👉 The 'appropriateness' assessments will be particularly tricky considering most AI providers aim to develop a single system, rather than an array of separate systems for every geography, ethnicity, culture, language etc.
👉 There are a whole lot of overlaps with a whole lot of regulations, which means good management, established processes and qualified people are the only way to avoid redundant tasks and assessments.
👉 I'm especially interested in witnessing the pending attempts at circumventing the imposed (additional) transparency requirements. 🙃