The nature of artificial intelligence (AI) and machine learning (ML) require large amounts of data to produce accurate results. AI and ML are revolutionizing how we interact with data and the General Data Protection Regulation (GDPR) is regulating what that interaction is allowed to look like. In this post, I will address why AI and ML applications need large amounts of data to be successful. Then I will discuss what GDPR is and how this policy affects the AI and ML space such as personalized systems. Finally, I will analyze how to integrate AI and ML into a GDPR compliant system and provide an overview of the consequences of violating the GDPR.
why are ai and ml so affected by gdpr?
AI and ML solutions require an overwhelming amount of data to achieve accurate results. How much data a model requires is a factor of the type of model and number of features. Features refer to individual characteristics or properties that are being captured in data and used to determine the output of the algorithm. Think of the data described as a spreadsheet where the data examples are represented in the rows and the features corresponding to the examples are the columns. Models with insufficient amounts of data are susceptible to overfitting. Overfitting happens when a model represents the given data too well and may therefore not accurately represent additional data. For example, the table below represents a data set with a length of five with the features name, age, gender, and country. This data set will be used to predict hair color.
Having only five members of the data set makes this very prone to overfitting. This model will start to associate the name George with having red hair. This happens because within this data set that association is 100% valid at n=2 (two instances of the association, almost half of our data set). This is an example of overfitting. As humans, we know that the name George does not imply that person has red hair. If our data set was significantly larger, we could assume that there would be at least one example with red hair whose name is not George and at least one example of a person with the name George who does not have red hair. The model represents the given initial data set with a length of five but is not representative of a larger data set and is therefore overfitting.
Machine learning and artificial intelligence models require large data sets to prevent the likelihood of overfitting. But large data sets require automated processing, and that can introduce the need for GDPR compliance.
what is gdpr and how does it affect ai/ml applications?
In the Spring of 2018, the European Union (EU) enacted the GDPR where it continues to govern the personal data of EU citizens. The GDPR came to fruition to standardize the laws regulating data protection throughout the EU. The goal is to provide the citizens of the EU with a better understanding of how their data is being used and a course of action for voicing concerns and complaints. Failure to comply with GDPR can result in fines of €20 M or 4% of annual global turnover. GDPR is a hefty document with 99 articles spanning 11 chapters and 88 pages. There are some great resources on the common articles of GDPR from groups such as the Information Commissioner’s Office. I will be covering just the parts relevant to ML and AI applications, specifically article 22. Article 22 of GDPR is titled “Automated Individual Decision-Making, Including Profiling” and can be referenced on the EU’s website.
Specialized systems and experiences are some of the most affected by GDPR. Previously, advertising technology companies were able to curate ads for a user based on a recommendation algorithm of sorts. Often these recommendation engines would ingest personal user data for their algorithms. This processing and automated decision making on personal user data can be prohibited by the GDPR.
how to implement ai and ml compliant with gdpr article 22
The short paragraphs of article 22 have opened a lot of possibilities. A data controller, which we will refer to as a company or business, is able to process data under article 22 for non-decision purposes. A company or business cannot make an automated decision about an individual unless:
The individual does not object (typically done through privacy policies).
The decision must have the potential to significantly impact the choices of the individual, permanently impact the individual, or lead to discrimination of the individual.
The company’s model is using anonymized data.
The company does not allow the decision to be fully automated and requires some level of human involvement.
The individual gives explicit consent.
There is a contract between the individual and company that requires this automated decision and the company implements sufficient safeguards and provides a human to address an individual’s questions and concerns.
The company is part of a large group that requires appropriate safeguards in data processing.
For the last two points, these cases are not to have features such as racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health, or data concerning a natural person’s sex life or sexual orientation.
Specialized systems including targeted advertising have had to shift their model of ad curation in the EU under GDPR by abstracting away the use of personal data. I am sure we have all browsed for an item online and shortly after found it plastered all over the ads in our next pages. This processing of personal data often used to select those advertisements is not permitted in GDPR without consent. Companies have had to move their recommendation formulas to use available data such as page content, user location, time of day, and other available data rather than user browsing history.
For example, suppose a user searches for camping gear and then proceeds to visit another site and search for computer monitors. Prior to GDPR, this user may have been given camping ads on the computer monitor page, as advertising technology companies had access to this type of information. However, this same scenario post-GDPR would likely not have camping ads on the computer monitor site. This is because the advertising companies no longer have the ability to process personal data about the user that they had previously used to select advertisements. The ads post-GDPR will likely be based on the page contents (computer monitors), location of user, time of day, etc.
The GDPR does not restrict models previously trained on non-GDPR compliant data. For a previous model to become non-compliant it must continue to automatically process the non-compliant data. Large, heavily affected tech companies such as Facebook and Google have done extensive work to become GDPR compliant. Both companies have sections of their website (Google and Facebook) explaining the steps they have taken to comply with GDPR, including outlining their updated policies and providing a place for someone to go and raise a complaint or opt out.
what happens if you break the rules?
GDPR is a large set of rules. What happens if we break some? Fees for violating GDPR are discretionary and are imposed on a case by case basis. The tier of fine is based on the article infringed. Violating article 22 falls under the second tier of fine. Fines from violating article 22 can be up to €20 million or 4% of annual global sales, whichever is higher. There are 10 criteria used to assess how much a fine will be. These are:
Nature of Infringement: How many people are affected, damage suffered, and nature of processing?
Intention: Was this infringement intentional or negligent?
Mitigation: What actions are being taken to mitigate damage to individuals?
Preventative measures: What prior steps were taken to prevent infringement?
History: Previous GDPR and Data Protection Directive infringements?
Cooperation: Cooperation and readiness to fix the infringement
Data Type: What type of data was infringed?
Notification: Did the company give proper and timely notification?
Certification: Was the company qualified under approved certifications?
Other: Financial impact to business, etc.
how to comply
The information in this article provides a solid understanding of GDPR, how it affects AI and ML technologies, and how to comply with the legal requirements. The most common route of GDPR compliance is with consent. The internet has some good examples of the specific legal requirements of these documents. If you have need additional assistance or have further questions about anything mentioned in this article, please feel free to reach out to us at firstname.lastname@example.org.