Navigating the Waters: Senator Cantwell's Bipartisan Privacy Bill
How New Legislation Could Reshape AI Development
For years many activists, many legislators, and a few everyday people have called for a national privacy law. But privacy bill after privacy bill has been dashed on the rocks of Maria Cantwell’s Senate Commerce Committee chairmanship. After the jetty cracks your hull it becomes your refuge. On April 7, 2024, Senator Cantwell (D-WA) released a draft of her American Privacy Rights Act. Cantwell wants you to know it’s different this time. This is the first time a comprehensive privacy bill has been supported by both parties, both bodies of Congress, and Senator Cantwell.
But slipped within the pages of the painstakingly negotiated privacy provisions are sections that will have massive consequences for the development of AI. Among other things, the bill requires impact assessments prior to deployment of covered algorithms. It contains ill-defined limits on the data a company can use to train an algorithm. And it defines algorithms so broadly that any statistical computational process could be covered.
The most important sections for AI are Section 2 (Definitions), Section 3 (Data minimization), Section 13 (Civil rights and algorithms), and Section 14 (Consequential decision opt out).
Section 2. Definitions.
The bill defines a “covered algorithm” as “a computational process, including one derived from machine learning, statistics, or other data processing or artificial intelligence techniques, that makes a decision or facilitates human decision-making by using covered data, which includes determining the provision of products or services or ranking, ordering, promoting, recommending, amplifying, or similarly determining the delivery or display of information to an individual.”1
Often government AI definitions are so broad they sweep in more activities than the government intended or so vague that it is unclear what was intended to be covered. Cantwell’s definition is broad. It covers any statistical computational process that makes decisions or facilitates the making of decisions using covered data. This could include programmatic advertising, high frequency trading, chess move calculators, and certain methods for dating pottery.
This definition of “algorithm” is in some respects narrower and in other respects broader than the definition of “artificial intelligence” in Biden’s October 30, 2023 Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. While Cantwell’s definition focuses on the purpose of the algorithm (i.e., decision-making), the EO does not limit its scope to systems that make decisions. Cantwell’s definition also focuses on data, covering algorithms that use “covered data.” The EO does not mention data in its AI definition. Cantwell’s definition is broader in the sense that it applies to any decision-making system, whether derived from machine learning, statistics, or other data or AI techniques.
“Covered data” is “information that identifies or is linked or reasonably linkable, alone or in combination with other information, to an individual or a device that identifies or is linked or reasonably linkable to 1 or more individuals.” There are four exclusions: (i) de-identified data; (ii) employee information; (iii) publicly available information; (iv) certain types of inferences made exclusively from multiple independent sources; and (v) information in certain types of libraries or museums.2
A “covered entity” is one that “determines the purposes and means of collecting, processing, retaining, or transferring covered data” and is subject to the Federal Trade Commission Act (FTCA), is a common carrier, or a nonprofit.3 This would include open source AI nonprofits like Eleuther AI. There are notable carve outs for governments, entities servicing the government, and small businesses. The government servicing carve out appears to mean that AI companies are exempt from this law so long as they land a government contract. This punishes companies with stiffer regulatory requirements if they do not work with the government. Yet the definition contains no provision preventing entities from simultaneously working for the US government and adversarial entities.
Small businesses are entities with revenue under $40M, which do not deal with the data of over 200,000 individuals, and do not transfer covered data to a third party for some consideration.
Section 3. Data minimization.
This section requires that covered entities “shall not collect, process, retain, or transfer covered data beyond what is necessary, proportionate, and limited to provide or maintain” (A) a specific product or service requested by the individual; or (B) “a communication by the covered entity to the individual reasonably anticipated within the context of the relationship.”4 The meaning of “necessary, proportionate, and limited” is unclear. How much data is necessary to maintain a product? Who decides? What if the company is interested in building a new product—one that the user has not specifically requested? Products from iPhones to social media sites regularly roll out popular features that users did not request. When does a new feature become a product or service such that it must be “requested by the individual”? What if the company wants to do R&D? That appears to fall outside the scope of what is necessary to provide a service requested by an individual. Will a company need to get users’ permission to innovate going forward?
Clause (B) is even harder to understand, though the issue here is grammatical. (B) requires entities to refrain from collecting data beyond what is necessary to “provide or maintain . . . a communication by the covered entity.” Presumably, not much data is required to provide “a communication” to an individual. And what does it mean to “maintain . . . a communication”?
Section 13. Civil rights and algorithms.
This is the real meat of the bill. Section 13 provides “[a] covered entity or a service provider may not collect, process, retain, or transfer covered data in a manner that discriminates in or otherwise makes unavailable the equal enjoyment of goods or services on the basis of race, color, religion, national origin, sex, or disability.”5 There are exceptions. Section 13(a)(1) does not apply to “the collection, processing, retention, or transfer of covered data for the purpose of . . . diversifying an applicant, participant, or customer pool.” Section 13(a)(2). Meaning that a covered entity may discriminate on the basis of race, color, religion, national origin, sex or disability if the purpose of that discrimination is to diversify an entity's applicants, participants or customers.
Section 13(c) requires an annual algorithm impact assessment to ensure compliance with section 13(a)(1): “a large data holder that uses a covered algorithm in a manner that poses a consequential risk of a harm” to an individual or group of individuals and uses that algorithm to collect, process, or transfer covered data shall conduct an impact assessment of such algorithm. This provision may substantially change how the average consumer interacts with AI systems. Because this section requires algorithms to conduct an impact assessment when they post a risk of harm to an individual, companies like OpenAI may be disincentivized from making their services available to consumers for free. Would OpenAI still make ChatGPT available to the public for free if that subjected it to reporting requirements?
Algorithms that pose a consequential risk of harm are those that produce harms related to (I) individuals under 17 years old; (II) making or facilitating advertising for, or determining access to, or restrictions on the use of housing, education, employment, healthcare, insurance, or credit opportunities; (III) determining access to, or restrictions on the use of, any place of public accommodation, particularly as such harms relate to the protected characteristics of individuals, including race, color, religion, national origin, sex, or disability; (IV) disparate impact on the basis of individuals’ race, color, religion, national origin, sex, or disability status; or (V) disparate impact on the basis of individuals’ political party registration status.6
The impact assessment must include “(i) A detailed description of the design process and methodologies of the covered algorithm. . . . (iv) A description of the outputs produced by the covered algorithm. . . . (vi) A detailed description of steps the large data holder has taken or will take to mitigate potential harms from the covered algorithm to an individual.”7 How much must be disclosed about a company’s methodology? How does a company present this in a way that would be meaningful to consumers? How much time and money does it need to exhaust to do so? How must a company describe its outputs? If its outputs are incredibly varied, is it enough to say that the outputs are incredibly varied or must they be represented? Does the output log need to cover every possible range output? Before a company rolls out an algorithm that helps it decide what to put in an ad for housing, it must note the detailed steps it took to mitigate any possible harms related to “the use of housing, education, employment, healthcare, insurance, or credit opportunities.” A company must outline how it’s working to mitigate “potential harms”—not the steps it’s taking to stem realized harms, but the steps it’s taking to prevent ever committing a harm.
The bill goes further, putting prior restraint on algorithms. This goes beyond yearly reporting. Prior to deploying a covered algorithm a covered entity must “evaluate the design, structure, and inputs of the covered algorithm, including any training data used to develop the covered algorithm, to reduce the risk of the potential harms.”8 The implications for open source machine learning here could be devastating.
Section 14. Consequential decision opt out.
“An entity that uses a covered algorithm to make or facilitate a consequential decision shall provide” notice to any individual subject to such use of the covered algorithm; and an opportunity for the individual to opt out of such use of the covered algorithm.9 If a person opts out, does the company have to offer the individual an alternative to the covered algorithm? If not, must the individual receive a refund for the company’s now-unusable services? Why would an individual opt out if they had contracted with a company in order to use the company’s services in the first place?
Conclusion
This discussion draft poses many problems for AI development. However, Chairs Cantwell and Rodgers should be applauded for publicly releasing a discussion draft for feedback so early in the process. As this legislation moves forward, members of congress should focus on how these policies will affect the AI of today and tomorrow. Crafting the right policies for AI development is crucial so that all Americans can realize all the benefits of these burgeoning technologies.