Licensure as Data Governance

Licensure as Data

Governance

By Frank Pasquale

DATA AND DEMOCRACY

In October 2020, the Knight First Amendment Institute at Columbia Uni-

versity convened a virtual symposium, titled “Data and Democracy,” to

investigate how technological advances relating to the collection, anal-

ysis, and manipulation of data are aecting democratic processes, and

how the law must adapt to ensure the conditions for self-government. This

symposium was organized by the Institute’s 2019-2020 Senior Visiting

Research Scholar, Yale Law Professor Amy Kapczynski, and co-sponsored

by the Law and Political Economy Project at Yale Law School.

The essays in this series were originally presented and discussed at this

two-day event. Written by scholars and experts in law, computer science,

information studies, political science, and other disciplines, the essays

focus on three areas that are both central to democratic governance and

directly aected by advancing technologies and ever-increasing data

collection: 1) public opinion formation and access to information; 2) the

formation and exercise of public power; and 3) the political economy of

data.

The symposium was conceptualized by Knight Institute sta, including

Jameel Jaer, Executive Director; Katy Glenn Bass, Research Director;

Amy Kapczynski, Senior Visiting Research Scholar; Alex Abdo, Litigation

Director; and Larry Siems, Chief of Sta. The essay series was edited by

Glenn Bass with additional support from Lorraine Kenny, Communi-

cations Director; A. Adam Glenn, Writer/Editor; and Madeline Wood,

Research Coordinator.

The full series is available at knightcolumbia.org/research/

2 KNIGHT FIRST AMENDMENT INSTITUTE

[In the late 1950s, the U.S.] government abdicated … responsibility to

establish rules, safeguards, and standards relating to the collection

and use of personal data for the purpose of directing human behavior.

Plainly, all of this might have gone dierently. Plenty of people believed

at the time that a people machine was entirely and utterly amoral. “My

own opinion is that such a thing (a) cannot work, (b) is immoral, (c)

should be declared illegal,” [soon-to-be-FCC Chair] Newton Minow

had written to Arthur Schlesinger in 1959. “Please advise.”

—Jill Lepore, If Then: How the Simulmatics Corporation Invented the

Future, 323.

INTRODUCTION

     face a crisis

of overwork and underresourcing. Enforcement of privacy laws is

too oen belated, if it comes at all. Massive rms with myriad data

points on tens of millions of people face nes for data misuse and security

breaches that are the economic equivalent of a parking ticket. Potentially

worse than all these well-recognized barriers to accountability is a known

unknown: namely, the black box problem. Even the most diligent regulators

and civil society groups have little idea of the full scope and intensity of

data extraction, analysis, and use at leading rms, given the triple barriers

of trade secrecy, nondisclosure agreements, and technical complexity now

eectively hiding their actions from public scrutiny. This crisis is likely to

continue unless there is a fundamental shi in the way we regulate the col-

lection, analysis, transfer, and use of data.

At present, policymakers tend to presume that the data practices of

rms are legal, and only investigate and regulate when there is suspicion of

wrongdoing. What if the presumption were ipped? That is, what if a rm

had to certify that its data practices met clear requirements for security,

nondiscrimination, accuracy, appropriateness, and correctability, before it

4 KNIGHT FIRST AMENDMENT INSTITUTE

collected, analyzed, or used data?

Such a standard may not seem admin-

istrable now, given the widespread and rapid use of data—and the articial

intelligence (AI) it powers—at rms of all sizes. But such requirements could

be applied, at rst, to the largest rms’ most troubling data practices, and

only gradually (if at all) to smaller ones and less menacing data practices.

For example, would it really be troubling to require rms to demonstrate

basic security practices once they have accumulated sensitive data on over

1 million people, before they continue to collect even more? Scholars have

argued that certain data practices should not be permitted at all.

Rather

than expecting underfunded, understaed regulators to overcome the mon-

umental administrative and black box problems mentioned above, responsi-

bility could be built into the structure of data-driven industries via licensure

schemes that require certain standards to be met before large-scale data

practices expand even further.

To give a concrete example motivating this ipped presumption about

data practices, consider the emergence of health inferences from data that

is not, on its face, health-predictive. For instance, an AI program, reviewing

only writing samples, “predicted, with 75 percent accuracy, who would get

Alzheimer’s disease.”

This type of inference could be used in subtle or secre-

tive ways by the rms making it, as well as by employers, marketers, nancial

institutions, and other important decision-makers.

Such predictions may

have massive impacts on those projected to have Alzheimer’s, including

denial of life insurance or long-term care insurance, denial of employment,

or loss of other opportunities. Even where such uses of the data are illegal,

complex and expensive legal systems may make it very dicult to enforce

one’s rights. Governments should ensure ex ante that predictions are done

and used in a responsible way, much as federally funded research is oen

channeled through institutional review boards in order to respect ethical

and legal standards.

A licensure regime for data and the AI it powers would enable citizens

to democratically shape data’s scope and proper use, rather than resigning

ourselves to being increasingly inuenced and shaped by forces beyond our

control. To ground the case for more ex ante regulation, Part I describes the

expanding scope of data collection, analysis, and use, and the threats that

that scope poses to data subjects. Part II critiques consent-based models

5LICENSURE AS DATA GOVERNANCE

of data protection, while Part III examines the substantive foundation of

licensure models. Part IV addresses a key challenge to my approach: the

free expression concerns raised by the licensure of large-scale personal

data collection, analysis, and use. Part V concludes with reections on the

opportunities created by data licensure frameworks and potential limitations

upon them.

I. THE EXPANDING SCOPE OF DATA

COLLECTION, ANALYSIS, AND USE

    more prevalent, massive rms

are privy to exceptionally comprehensive and intimate details about

individuals.

These can include transport, nancial, retail, health,

leisure, entertainment, location, and many other kinds of data. Once large

enough stores of data are created, there are increasing opportunities to cre-

ate inferences about persons based on extrapolations from both humanly

recognizable and ad hoc, machine learning-recognizable groups.

Much observation can occur without persons’ consent. Even when con

sent is obtained, improper data collection may occur. Increasingly desperate

individuals may be eectively coerced via their circumstances to permit

comprehensive, 360-degree surveillance of key aspects of their lives. For

example, many people now cannot access credit because of a “thin credit

le”—industry lingo for someone without much of a repayment history.

Fintech rms promise “nancial inclusion” for those willing to give lenders

access to their social media activity and other intimate data.

Critics char-

acterize this “opportunity” as predatory inclusion, and it is easy to see why:

Those enticed into the sphere of payday lenders may not only contract into

unsustainable debt-repayment schemes but may also become marks for

other exploitative businesses, such as for-prot universities or unlicensed

rehab centers.

The economic logic here (giving up more privacy in exchange for lower

interest rates or other favorable terms) is, in principle, illimitable. For exam

ple, a rm may give a loan applicant a reduced rate on a car loan, if she

allows it (and its agents) to download and then analyze all information on

6 KNIGHT FIRST AMENDMENT INSTITUTE

her mobile phone during the term of the loan, and to resell the data. Such

borrowers may never even know what is done with their data, thanks to

all-pervasive trade secrecy in the industry. Tracker apps on cell phones may

allow rms to record their employees’ location at all times. According to one

aggrieved worker, her boss “bragged that he knew how fast she was driving

at specic moments ever since she had installed [an] app on her phone.”

Even when such comprehensive surveillance is not consented to, AI

can operate as a “prediction machine,” analyzing data to make damaging

inferences about individuals.

These inferences may be entirely unexpected,

based on correlations that can only be found in vast troves of data.

For

example, a person’s proclivity to be depressed may be related to the apps or

websites they visit (or how they use those apps and websites). The websites

themselves may keep such data, or it may be collected by third parties with

commercial or other relationships with the sites or apps.

Correlations based on how a person uses their phone or computer may

be entirely unexpected. For example, a high-ranking Catholic cleric in the

U.S. was recently reported to be a user of the gay dating site Grindr by jour-

nalists at the digital publication named The Pillar, based on computing data.

As journalist Molly Olmsted explained:

According to one privacy engineer who has worked on issues related to

location data, Pillar (or the group that had oered CNA [the Catholic News

Agency] the data back in 2018) probably purchased a data set from a data

broker, which in turn had likely purchased the data from a third-party ad

network that Grindr uses. Grindr itself would not have been the source of

the data, but the ad network would have been given full access to the users’

information as long as they agreed to Grindr’s terms of services. (In 2018,

Grindr, which uses highly granular location information, was found to have

shared users’ anonymized locations, race, sexual preferences, and even HIV

status with third-party analytic rms.)

Whatever your views about the cleric in this case, the generalized exposure

of a person’s dating practices, or other intimate inferences based on loca-

tion-based data, is something a mature privacy law should prevent.

7LICENSURE AS DATA GOVERNANCE

Researchers have also analyzed certain activities of people who exten-

sively searched for information about Parkinson’s disease on Bing, including

their mouse movements six months before they entered those search terms.

Most users of the internet are probably unaware that not just what they

click on, but how fast and smoothly they move their mouse to do so, can be

recorded and traced by the sites they are using. The group of Bing users who

searched for Parkinson’s—which it is probably safe to assume is far more

likely to have Parkinson’s than the population as a whole—tended to have

certain tremors in their mouse movements distinct from other searchers.

These tremor patterns were undetectable by humans—only machine learning

could distinguish the group identied to have a higher propensity to have

Parkinson’s, based in part on microsecond-by-microsecond dierences in

speed and motion of hand movement.

There is at present no widely available defense to such detection—no

close-at-hand, privacy-enhancing strategy that can prevent such technol-

ogies of inference, once developed, from classifying someone as likely to

develop grave illness. Perhaps a clever technologist could develop tools to

“fuzz” tremor movements, or to smooth or normalize their transmission so

that they do not signal abnormality. But we should not expect internet users

to defend themselves against such invasive categorization by joining an arms

race of deection and rediscovery, obfuscation and clarication, encryption

and decryption.

It is a waste of our time. And those of higher socioeconomic

status have many more resources at hand to engage in such arms races, thus

adding the insult of rising inequality to the injury of privacy harm.

Moreover, a patchwork of weak and oen underenforced privacy laws is

no match for the threats posed by large-scale data processing, which can be

used to either overtly or secretly rank, rate, and evaluate people, oen to their

detriment and unfairly.

Without a society-wide commitment to fair data

practices, a troubling era of digital discrimination will be entrenched. The

more data about signals of future distress, illness, or disability are available,

the better AI will predict these conditions, enabling unscrupulous actors to

take advantage of them.

8 KNIGHT FIRST AMENDMENT INSTITUTE

II. THE IMPRACTICALITY OF CONSENT-BASED

MODELS IN AN AGE OF BIG DATA

    regulation begin with consent,

shiing responsibility to data subjects to decide to whom they

should grant their data, and to whom they should deny it. On the

consent-based view, it is up to the data subject to vet the reliability of enti-

ties seeking access to her data and monitor their ongoing abidance with the

promises they made when they obtained it. A consent-based model makes

sense as part of a contractarian and propertarian view of legal order: Data

are like objects, property of a person who may freely contract to give or share

the data with others.

On this view, just as a corporation needs consent to,

say, take a person’s property, it needs consent to take personal data. Once a

contract is struck, it governs the future exchange and use of the data.

This consent-based approach has multiple inrmities.

Much data

arises out of observation unrestricted by even theoretical contracts. To give

an example: A Google user may “consent” to data collection while using

the service, but no one is asked before the cameras on Google’s street-pho-

tographing cars roll down the road to grab data for its mapping service.

The rm does blot out faces when publishing the images to the internet,

but the principle remains: All manner of information can be garnered by

microphones, cameras, and sensors trained on the public at large, and

connected to particular persons, automobiles, or households via massive

facial recognition databases. Aerial surveillance can trace a person’s every

outdoor step and movement from home and back each day, as Baltimore’s

recent “spy plane” litigation has revealed.

Even for data that can be practically bound by contract, terms-of-service

agreements are continually subject to change. These changes almost always

favor service providers—despite the serious lock-in costs and reliance inter

ests of users. If one is dealing with a monopoly or must-have service, there is

no real choice but to accept their terms.

Even when choice is supercially

available, terms of service are by and large similar and imposed as a fait

accompli: The user either accepts them or does not get to use the service.

Sometimes, under exceedingly gentle pressure from regulators, rms will

magnanimously oer assurances about certain limits on the uses of data.

9LICENSURE AS DATA GOVERNANCE

However, data subjects are now under surveillance by so many rms that

it is impossible for them to audit such assurances comprehensively. How

can a person with a job and family to take care of try to gure out which of

thousands of data controllers has information about them, has correct infor-

mation, and has used it in a fair and rigorous manner? In the U.S., even the

diligent will all too oen run into the brick walls of trade secrecy, proprietary

business methods, and malign neglect if they do so much as ask about how

their data has been used, with whom it has been shared, and how it has been

analyzed.

Europeans may make “subject access requests,” but there are far

too many data gathering and data processing rms for the average person to

conduct reviews of their results in a comprehensive way.

The analogy between data and property also breaks down in an era of

digitization, when data can be so easily copied, transferred, stored, and

backed up.

Securing a house is a relatively easy matter compared with

securing one’s data. Even the most conscientious data subjects need only

slip once, failing to read a key clause in terms of service, or transacting

with an unreliable or insecure counterparty. Then critical, discrediting,

disadvantaging, or embarrassing data about them could end up copied and

recopied, populating countless databases. And even when the data subject

has clearly been wronged by a data controller or processor, the judiciary may

make litigation to obtain redress dicult or impossible.

As data and its analysis, sharing, and inferences proliferate, the consent

model becomes less and less realistic. There is simply too much for any indi-

vidual to keep track of. Nor have data unions risen to the challenge, given

the immense diculty of attaining any kind of bargaining leverage vis-à-

vis rst-party data collectors. There are myriad data gatherers, hundreds

of scoring entities,

and blacklists covering housing, voting, travel, and

employment.

Even if consent-based regimes are well-administered, they

can result in data gaps that impede, for instance, both medical research and

opportunities for clinical care. Forced into an environment where few adverse

uses of compromising data and inferences are forbidden (and where the

penalties for such uses are not sucient to deter wrongdoers), data subjects

easily forego valuable opportunities (such as future AI-enabled diagnoses)

or eschew low-risk chances to contribute to the public good by sharing data

(for research studies).

10 KNIGHT FIRST AMENDMENT INSTITUTE

Moreover, we cannot reasonably expect good administration of con-

sent-based regimes in many areas. In the U.S., regulatory agencies are ill-

suited to enforce others’ contracts. Even when they do bring cases on the

basis of deception claims, the penalties for failure to comply have frequently

been dismissed as a mere cost of doing business. First Amendment defenses

may also complicate any lawsuit predicated on an eort to stop the transfer

or analysis of data and inferences. In Europe, while data protection author-

ities are empowered by law to advance the interests of data subjects via the

General Data Protection Regulation (GDPR), they have in practice proven

reluctant to impose the types of penalties necessary to ensure adherence

to the law.

III. BEYOND CONSENT: THE EX ANTE

REGULATORY IMPERATIVE

    large-scale data processing that is more

responsive to the public interest is to ensure that proper scrutiny

occurs before the collection, analysis, and use of data. If enacted

via a licensure regime, this scrutiny would enable a true industrial policy for

big data, deterring misuses and thereby helping to channel AI development

in more socially useful directions.

As AI becomes more invasive and con-

tested, there will be increasing calls for licensure regimes. To be legislatively

viable, proposals for licensure need theoretical rigor and practical specicity.

What are the broad normative concerns motivating licensure? And what

types of uses should be permitted?

Cognizant of these queries, some legislators and regulators have begun

to develop an explicitly governance-driven approach to data.

While not

embracing licensure, Sen. Sherrod Brown of Ohio has demonstrated how

substantive limits may be enforced via licensure restrictions for large-scale

data collection, analysis, and use.

His Data Accountability and Transpar-

ency Act would amount to a Copernican shi in U.S. governance of data,

putting civil rights protection at the core of this approach to data regula-

tion.

This reects a deep concern about the dangers of discrimination

against minoritized or disadvantaged groups, as well as against the “invisible

11LICENSURE AS DATA GOVERNANCE

minorities” I have previously described in The Black Box Society.

For exam-

ple, the mouse microtremor example mentioned above may be prevented

by the Data Accountability and Transparency Act, which would forbid the

calculation of the inference itself by entities that intend to discriminate

based on it (or, more broadly, entities that have not demonstrated a per-

sonal or public health rationale for creating, disseminating, or using it).

On the other hand, the inference may be permissible as a way of conducting

“public or peer-reviewed scientic, historical, or statistical research in the

public interest, but only to the extent such research is not possible using

anonymized data.”

Thus, the generalizable nding may be made public,

but its harmful use against an individual would be precluded by preventing

a rm with no reasonable method of improving the person’s health from

making the inference. This avoids the “runaway data” problem I described

in Black Box Society, where data collection and analysis initially deemed

promising and helpful becomes a bane for individuals stigmatized by them.

Such assurances should enable more societal trust in vital data collec-

tion initiatives, like for health research, pandemic response, and data-driven

social reform. For a chilling example of a loss of trust in a situation without

such protections, we need only turn to the misuse of prescription databases

in the U.S. In the 2000s, patients in the U.S. were assured that large data-

bases of prescription drug use would be enormously helpful to them if they

ended up in an emergency room away from home, since emergency doctors

could have immediate access to this part of their medical record, and avoid

potentially dangerous drug interactions. However, that use of the database

was not immediately protable and did not become widespread. Rather, the

database became a favored information source of private insurers seeking

to deny coverage to individuals on the basis of “preexisting conditions.”

To avoid such future misuses and abuses of trust, we must develop ways of

preventing discriminatory uses of personal data, and of shaping the data

landscape generally, rather than continuing with a regime of post hoc, par-

tial, and belated regulation.

Sensitive to such misuses of data, ethicists have called for restrictions

on certain types of AI, with a presumption that it be banned unless licensed.

For example, it may be reasonable for states to develop highly specialized

databases of the faces of terrorists. But to deploy such powerful technology

12 KNIGHT FIRST AMENDMENT INSTITUTE

to ticket speeders or ferret out benets fraud is inappropriate, like using a

sledgehammer to kill a y.

A rational government would not license the

technology for such purposes, even if it would be entirely reasonable to do so

for other purposes (for example, to prevent pandemics via early detection of

infection clusters). Nor would it enable many of the forms of discrimination

and mischaracterization now enabled by light-to-nonexistent regulation of

large-scale data collection, analysis, and use.

The rst order of business for a reformed data economy is to ensure that

inaccurate, irresponsible, and damaging data collection, analysis, and use

are limited. Rather than assuming that data collection, processing, and use

are in general permitted, and that regulators must struggle to catch up and

outlaw particular bad acts, a licensure regime ips the presumption. Under

it, large-scale data collectors, brokers, and analysts would need to apply

for permission for their data collection, analysis, use, and transfer (at the

very least for new data practices, if older ones are “grandfathered” and thus

assumed to be licensed). To that end, a stricter version of the Data Account-

ability and Transparency Act might eventually insist that data brokers obtain

a license from the government in order to engage in the collection, sale,

analysis, and use of data about identiable people.

IV. FREE EXPRESSION CONCERNS RAISED BY THE

LICENSURE OF LARGE-SCALE PERSONAL DATA

COLLECTION, ANALYSIS, AND USE

    or the AI it powers, licensure

regimes will face challenges based on free expression rights.

The ironies here are manifold. The classic scientic process is

open, inviting a community of inquirers to build on one another’s works;

meanwhile, the leading corporate data hoarders most likely to be covered

by the licensing regime proposed here are masters of trade secrecy, aggres-

sively blocking transparency measures. Moreover, it is now clear that the

corporate assertion of such alleged constitutional rights results in databases

that chill speech and online participation.

It is one thing to go to a protest

when security personnel watch from afar. It is quite another when the police

13LICENSURE AS DATA GOVERNANCE

can immediately access your name, address, and job from a quick face scan

purchased from an unaccountable private rm.

This may be one reason why the American Civil Liberties Union deci-

sively supported the regulation of Clearview AI (a rm providing facial recog-

nition services) under the Illinois Biometric Information Privacy Act (BIPA),

despite Clearview’s insistence (to courts and the public at large) that it has

a First Amendment right to gather and analyze data unimpeded by BIPA. If

unregulated, the rm’s activities seem far more likely to undermine a robust

public sphere than to promote it. Moreover, even if its data processing were

granted free expression protections, such protections may be limited by

“time, place, and manner” restrictions. In that way, the licensure regime I

am proposing is much like permit requirements for parades, which recognize

the need to balance the parade organizers’ and marchers’ free expression

rights against the public need for safe and orderly streets. Given the privacy

and security concerns raised by mass data collection, analysis, and use,

restrictions on data practices thus may be subject to only intermediate scru-

tiny in the U.S.

Even more sensible is the Canadian rejection of the data

aggregator’s free expression claim tout court.

When an out-of-control data gathering industry’s handiwork can be

appropriated by both government and business decision-makers, data and

inferences reect both knowledge and power: They are descriptions of the

world that also result in actions done within it. They blur the boundary

between speech and conduct, observation and action, in ways that law can

no longer ignore. Mass data processing is unlike the ordinary language (or

“natural language”) traditionally protected by free expression protections.

Natural language is a verbal system of communication and meaning-mak-

ing. I can state something to a conversation partner and hope that my stated

(and perhaps some unstated) meanings are conveyed to that person. By

contrast, in computational systems, data are part of a project of “opera-

tional language”; their entry into the system produces immediate eects.

As Mark Andrejevic explains in Automated Media, there is no interpretive

gap in computer processing of information.

The algorithm fundamentally

depends on the binary (1 or 0), supplemented by the operators “and, or,

not.” In Andrejevic’s words, “machine ‘language’ … diers from human

language precisely because it is non-representational. For the machine, there

14 KNIGHT FIRST AMENDMENT INSTITUTE

is no space between sign and referent: there is no ‘lack’ in a language that is

complete unto itself. In this respect, machine language is ‘psychotic’ … [envi-

sioning] the perfection of social life through its obliteration.”

This method

of operation is so profoundly dierent than human language—or the other

forms of communication covered by free expression protections—that courts

should be exceptionally careful before extending powerful “rights to speak”

to the corporate operators of computational systems that routinely abridge

human rights to privacy, data protection, and fair and accurate classication.

Unregulated AI is always at risk of distorting reality. Philosophers of

social science have explained the limits and constraints algorithmic pro-

cessing has imposed on social science models and research.

Scholars in

critical data studies have exposed the troubling binaries that have failed to

adequately, fairly, and humanely represent individuals. For example, Os

Keyes has called data science a “profound threat for queer people” because

of its imposition of gender binaries on those who wish to escape them (and

who seek societal acceptance of their own armation of their gender).

this light, it may well be the case that an entity should only process data

about data subjects’ gender (and much else) if it has been licensed to do so,

with licensure authorities fully cognizant of the concerns that Keyes and

other critical data scholars have raised.

The shi to thinking of large-scale data processing as a privilege, instead

of as a right, may seem jarring to American ears, given the expansion of First

Amendment coverage over the past century.

However, even in the U.S. it

is roundly conceded that there are certain particularly sensitive pieces of

“information” that cannot simply be collected and disseminated. A die-hard

cyberlibertarian or anarchist may want to copy and paste bank account

numbers or government identication numbers onto anonymous websites,

but that is illegal because complex sociotechnical systems like banks and the

Social Security Administration can only function on a predicate of privacy

and informational control.

We need to begin to do the same with respect

to facial recognition and other biometrics, and to expand this caution with

respect to other data that may be just as invasive and stigmatizing. Just as

there is regulation of federally funded human subjects research, similar

patterns of review and limitation must apply to the new forms of human

classication and manipulation now enabled by massive data collection.

15LICENSURE AS DATA GOVERNANCE

A licensure regime for big data analytics also puts some controls on the

speed and ubiquity of the correlations such systems can make. Just as we

may want to prevent automated bots from dominating forums like Twitter,

we can and should develop a societal consensus toward limiting the degree

to which automated correlations of oen biased, partial, and secret data

inuence our reputations and opportunities.

This commitment is already a robust part of nance regulation. For

example, when credit scores are calculated, the Fair Credit Reporting Act

imposes restrictions on the data that can aect them.

Far from being a for-

bidden content-based restriction on the “speech” of scoring, such restrictions

are vital to a fair credit system.

The Equal Credit Opportunity Act takes the

restrictions further regarding a creditor’s scoring system.

Such scoring

systems may not use certain characteristics—such as race, sex, gender,

marital status, national origin, religion, or receipt of public assistance—as

a factor regarding a customer’s creditworthiness.

Far from being a relic of

the activist 1970s, restrictions like this are part of contemporary eorts to

ensure a fairer credit system.

European examples abound as well. In Germany, the United Kingdom,

and France, agencies cannot use ethnic origin, political opinion, trade union

membership, or religious beliefs when calculating credit scores.

Germany

and the United Kingdom also prohibit the use of health data, while France

allows the use of health data in credit score calculations.

Such restrictions

might be implemented as part of a licensure regime for use of AI-driven

propensity scoring in many elds. For example, authorities may license

systems that credibly demonstrate to authorized testing and certication

bodies that they do not process data on forbidden grounds, while denying

a license to those that do.

Moreover, credit scores themselves feature as forbidden data in some

other determinations. For example, many U.S. states prevent them from

being used by employers.

California, Hawaii, and Massachusetts ban the

use of credit scoring for automobile insurance.

A broad coalition of civil

rights and workers’ rights groups reject these algorithmic assessments of

personal worth and trustworthiness.

The logical next step for such activ-

ism is to develop systems of evaluation that better respect human dignity

and social values in the construction of actionable reputations—those with

16 KNIGHT FIRST AMENDMENT INSTITUTE

direct and immediate impact on how we are classied, treated, and evalu-

ated. For example, many have called for the nationalization of at least some

credit scores.

Compared with that proposal, a licensure regime for such

algorithmic assessments of propensity to repay is moderate.

To be sure, there will be some dicult judgment calls to be made, as

in the case with any licensure regime. But size-based triggers can blunt the

impact of licensure regimes on expression, focusing restrictions on rms with

the most potential to cause harm. These rms are so powerful that they are

almost governmental in their own right.

The EU’s Digital Services Act pro-

posal, for example, includes obligations that would only apply to platforms

that reach 10 percent of the EU population (about 45 million people).

The

Digital Markets Act proposal includes obligations that would only apply to

rms that provide “a core platform service that has more than 45 million

monthly active end users established or located in the Union and more

than 10,000 yearly active business users established in the Union in the last

nancial year.”

In the U.S., the California Consumer Privacy Act applies to

companies that have data on 50,000 California residents.

Many U.S. laws

requiring security breach notications generally trigger at around 500-1,000

records breached.

In short, a nuanced licensing regime can be developed

that is primarily aimed at the riskiest collections of data, and only imposes

such obligations (or less rigorous ones) on smaller entities as the value and

administrability of requirements for larger rms is demonstrated.

V. CONCLUSION

    “grand bargain for big data” I outlined in 2013,

followed by the “redescription of health privacy” I proposed in 2014,

is a reorientation of privacy and data protection advocacy.

The

state, its agencies, and the corporations they charter only deserve access

to more data about persons if they can demonstrate that they are actually

using that data to advance human welfare. Without proper assurances that

the abuse of data has been foreclosed, citizens should not accede to the

large-scale data grabs now underway.

17LICENSURE AS DATA GOVERNANCE

Not only ex post enforcement but also ex ante licensure is necessary

to ensure that data are only collected, analyzed, and used for permissible

purposes. This article has sketched the rst steps toward translating the

general normative construct of a “social license” for data use into a specic

licensure framework. Of course, more conceptual work remains to be done,

both substantively (elaborating grounds for denying a license) and practi-

cally (to estimate the resources needed to develop the rst iteration of the

licensing proposal).

The consent model has enjoyed the benets of such

conceptual work for decades; now it is time to devote similar intellectual

energy to a licensing model.

Ex ante licensure of large-scale data collection, analysis, use, and shar-

ing should become common in jurisdictions committed to enabling demo-

cratic governance of personal data. Dening permissible purposes for the

licensure of large-scale personal data collection, analysis, use, and sharing

will take up an increasing amount of time for regulators, and law enforcers

will need new tools to ensure that regulations are actually being followed.

The articulation and enforcement of these specications will prove an essen-

tial foundation of an emancipatory industrial policy for AI.

18 KNIGHT FIRST AMENDMENT INSTITUTE

NOTES

1 In this article, I collectively refer to the collec-

tion, analysis, transfer, and use of data as “data prac-

tices.” The analysis of one set of data can create a

new set of data, or inferences; for my purposes, all

this follow-on development of data and inferences

via analysis is included in the term analysis itself.

2 For earlier examples of this kind of move to sup-

plement ex post regulation with ex ante licensure,

see Saule T. Omarova, License to Deal: Mandatory

Approval of Complex Financial Products, 90 W.

U. L. R. 63, 63 (2012); Andrew Tutt, An FDA

for Algorithms, 69 A. L. R. 83 (2017); F

P, T B B S 181 (2015). The

Federal Communications Commission’s power

to license spectrum and devices is also a useful

precedent here—and one reason my epigraph for

this piece gestures to the work and views of one

of the most influential FCC commissioners in

U.S. history, Newton Minow. Like the airwaves,

big data may usefully be considered as a public

resource. Salome Viljoen, Democratic Data: A

Relational Theory For Data Governance, Y L.

J. (forthcoming, 2021).

Siddharth Venkataramakrishnan, Top research-

ers condemn ‘racially biased’ face-based crime predic-

tion, F. T (June 24, 2020), https://www..com/

content/aaa9e654-c962-46c7-8dd0-c2b4af932220

[https://perma.cc/AZU5-VLPD] (“More than 2,000

leading academics and researchers from institutions

including Google, MIT, Microso and Yale have

called on academic journals to halt the publication

of studies claiming to have used algorithms to predict

criminality. The nascent eld of AI-powered ‘criminal

recognition’ trains algorithms to recognise complex

patterns in the facial features of people categorised

by whether or not they have previously committed

crimes.”). For more on the problems of face-focused

prediction of criminality by AI, see Frank Pasquale,

When Machine Learning is Facially Invalid, 61 C-

’ ACM 25, 25 (Sept. 2018).

4 See also Daten ethik kommission [Data Ethics

Commission of the Federal Government of Germany],

Opinion of the Data Ethics Commission (2019), 195

(calling for “Preventive ocial licensing procedures

for high-risk algorithmic systems”). The DEC ob-

serves that, “[I]n the case of algorithmic systems with

regular or appreciable (Level 3) or even signicant

potential for harm (Level 4), in addition to existing

regulations, it would make sense to establish licens-

ing procedures or preliminary checks carried out by

supervisory institutions in order to prevent harm to

data subjects, certain sections of the population or

society as a whole.” Id. Such licensing could also be

promulgated by national authorities to enforce the

European Union’s proposed AI Act. Frank Pasquale

& Gianclaudio Malgieri, Here’s a Model for Reining in

AI’s Excesses, N.Y. T, Aug. 2, 2021, at A19.

Gina Kolata, The First Word on Predicting Alz-

heimer’s, N.Y. T, Feb. 2, 2021, at D3.

6 P, supra note 2, at 149; Frank Pasquale,

Promoting Data for Well-Being While Minimizing Stig-

ma: Foundations of Equitable AI Policy for Health Pre-

dictions, in D D (Martin Moore &

Damian Tambini, eds., Oxford University Press, 2021).

7 For more on this analogy between big data-driv-

en health prediction and human subjects research,

see James Grimmelmann, The Law and Ethics of Ex-

periments on Social Media Users, 13 C. T. L. J.

219 (2015); Frank Pasquale, Privacy, Autonomy, and

Internet Platforms, in P I T M A:

T S F S (Marc Rotenberg et al.

eds., 2015).

8 Theodore Rostow, What Happens When an Ac-

quaintance Buys Your Data?: A New Privacy Harm in

the Age of Data Brokers, 34 Y J.  R. 667 (2017).

Brent Mittelstadt, From Individual to Group Pri-

vacy in Biomedical Big Data, in B D, H

L,  B 175 (I. Glenn Cohen et al. eds.,

2018); Sandra Wachter & Brent Mittelstadt, A Right to

Reasonable Inferences: Re-Thinking Data Protection

Law in the Age of Big Data and AI, 2 C. B. L.

R. (2019).

Aaron Chou, What’s In The “Black Box”? Bal-

ancing Financial Inclusion and Privacy in Digital

Consumer Lending, 69 D L. J. 1183, 1192 (2020).

11 James Vincent, Woman red aer disabling

work app that tracked her movements 24/7, V

(May 13, 2015, 7:01 AM), https://www.theverge.

com/2015/5/13/8597081/worker-gps-red-myrna-

19LICENSURE AS DATA GOVERNANCE

arias-xora.

12 A A  ., P M:

T S E  A I-

 (2018).

13 Eric Horvitz & Deirdre Mulligan, Data, privacy,

and the greater good, S, July 17, 2015, at 253.

14 Molly Olmsted, A Prominent Priest Was Outed

for Using Grindr. Experts Say It’s a Warning Sign,

S (July 21, 2021, 7:03 PM), https://slate.com/

technology/2021/07/catholic-priest-grindr-data-pri-

vacy.html [https://perma.cc/DN5H-J8NA].

15 Ryen W. White et al., Detecting Neurogenerative

Disorders from Web Search Signals, 1 NPJ D. M.

1 (Apr. 23, 2018), https://www.nature.com/articles/

s41746-018-0016-6.pdf [https://perma.cc/9KDK-VB-

GZ]. In this case, the source of the information was

clear: Microso itself, which operates Bing, permit-

ted the researchers to study anonymized databases.

For an analysis of the import of such data in the U.S.,

where it is now well beyond the scope of the privacy

and security protections guaranteed pursuant to the

Health Insurance Portability and Accountability Act

(HIPAA) and the Health Information Technology for

Economic and Clinical Health (HITECH) Act), see

National Committee on Vital and Health Statistics,

Subcommittee on Privacy, Condentiality, and Se-

curity, Health Information Privacy Beyond HIPAA

(2019), at https://ncvhs.hhs.gov/wp-content/up-

loads/2019/07/Report-Framework-for-Health-Infor-

mation-Privacy.pdf [https://perma.cc/A5JW-QLW8].

16 As I elaborate in a recent book, one critical goal

of technology policy should be stopping such arms

races. F P, N L  R

(2020); see also Frank Pasquale, Paradoxes of Pri-

vacy in an Era of Asymmetrical Social Control, in B

D, C  S C (Aleš Završnick

ed., 2018) (on the encryption arms race). As Jathan

Sadowski has argued, “cyberhygiene” arguments all

too easily degenerate into victim-blaming. J

S, T S (2019).

17 Of course, in the right hands, the data could be

quite useful. We may want our doctors to access such

information, but we need not let banks, employ-

ers, or others use it. That is one foundation of the

licensing regime I will describe in Part III: ensuring

persons can generally presume data associated with

them is being used to advance their well-being, rath-

er than to stigmatize or exclude them.

For a powerful critique of extant privacy laws

in the U.S. and Europe, see J C, B

T  P (2019).

On the fallacies inherent in this model in the

context of health privacy, see Barbara J. Evans, Much

Ado About Data Ownership, 25 H. J. L. & T. 70

(2011).

20 Julie Cohen, Turning Privacy Inside Out, 20

T I L. 1, 1 (2019) (“[P]rivacy’s

most enduring institutional failure modes flow

from its insistence on placing the individual and

individualized control at the center.”); Bart Willem

Schermer et al., The Crisis of Consent: How Stronger

Legal Protection May Lead to Weaker Consent in Data

Protection, 16 E  I. T. 171 (2014);

Gabriela Zanr-Fortuna, Forgetting About Consent:

Why the Focus Should Be on ‘Suitable Safeguards’ in

Data Protection Law (May 10, 2013) (unpublished

working paper) (https://papers.ssrn.com/sol3/pa-

pers.cfm?abstract_id=2261973 [https://perma.cc/

DU9X-97ND]).

21 Leaders of a Beautiful Struggle v. Balt. Police

Dep’t, 1:20-cv-00929-RDB (4th Cir., June 24, 2021).

A sharply divided Fourth Circuit Court of Appeals

ruled the program unconstitutional.

Frank Pasquale, Privacy, Antitrust, and Pow-

er, 20 G. M L. R. 1009 (2013); Andreas

Mundt, Bundeskartellamt prohibits Facebook

from combining user data from dierent sources,

Bundeskartellamt (Feb. 7, 2019), https://www.

bundeskartellamt.de/SharedDocs/Meldung/EN/

Pressemitteilungen/2019/07_02_2019_Facebook.

html [https://perma.cc/X6US-RL7A] (“In view of

Facebook’s superior market power, an obligatory

tick on the box to agree to the company’s terms of

use is not an adequate basis for such intensive data

processing. The only choice the user has is either to

accept the comprehensive combination of data or to

refrain from using the social network. In such a dif-

cult situation the user’s choice cannot be referred

to as voluntary consent.”).

23 A E W, I U: T

I S  P, D,  C

P (2021).

20 KNIGHT FIRST AMENDMENT INSTITUTE

Even in the health care system, where access to

such information is supposed to be guaranteed by

federal health privacy laws, patients nd consider-

able barriers to the exercise of their rights.

For a sophisticated response to this problem,

see Václav Janeček & Gianclaudio Malgieri, Com-

merce in Data and the Dynamically Limited Alien-

ability Rule, 21 G. L. J. 924 (2020).

Daniel J. Solove & Danielle Keats Citron, Stand-

ing and Privacy Harms: A Critique of TransUnion v.

Ramirez, 101 B. U. L. R. O 62 (2021).

27 Pam Dixon & Bob Gellman, The Scoring

of America (World Policy Forum, Apr. 2, 2014),

http://www.worldprivacyforum.org/wp-con-

tent/uploads/2014/04/WPF_Scoring_of_Ameri-

ca_April2014_fs.pdf [https://perma.cc/GP6J-L75J];

Danielle Keats Citron & Frank Pasquale, The Scored

Society: Due Process for Automated Predictions, 89

W. L. R. 1, 31 (2014).

28 Margaret Hu, Big Data Blacklisting, 67 F. L.

R. 1735 (2016).

29 Adam Satariano, Europe’s Privacy Law Hasn’t

Shown Its Teeth, N.Y. T, Apr. 28, 2020, at B1.

30 In this way, my proposals here are an exten-

sion of the ideas I develop in Frank Pasquale, Data

Informed Duties for AI Development, 119 C. L.

R. 1917, 1917 (2019) (“Law should help direct—and

not merely constrain—the development of articial

intelligence (AI). One path to inuence is the devel-

opment of standards of care both supplemented and

informed by rigorous regulatory guidance.”).

31 For example, the European Data Protection

Board is exploring certication. Eur. Data Prot.

Bd., Guidelines 1/2018 on certication and identi-

fying certication criteria in accordance with Arti-

cles 42 and 43 of the Regulation, Version 3.0 (June

4, 2019), https://edpb.europa.eu/our-work-tools/

our-documents/guidelines/guidelines-12018-certi-

cation-and-identifying_en [https://perma.cc/Q365-

M99F] (“Before the adoption of the GDPR, the Arti-

cle 29 Working Party established that certication

could play an important role in the accountability

framework for data protection. In order for certica-

tion to provide reliable evidence of data protection

compliance, clear rules setting forth requirements

for the provision of certication should be in place.

Article 42 of the GDPR provides the legal basis for

the development of such rules.”).

32 In future work, I hope to compare Brown’s

proposal with the GDPR’s denition of “legitimate

purposes.” Under the GDPR, “Personal data shall

be … collected for specied, explicit and legitimate

purposes and not further processed in a manner

that is incompatible with those purposes.” Chris Jay

Hoofnagle et al., The European Union general data

protection regulation: what it is and what it means, 28

I. & C. T. L. 65, 77 n. 82 (2019) (quoting

Council Directive 2016/679 O.J. (L 119) art. 5(1)(b)

(General Data Protection Regulation)).

33 Press Release, Sherrod Brown U.S. Sen. for

Ohio, Brown Releases New Proposal That Would

Protect Consumers’ Privacy from Bad Actors (June

18, 2020), (https://www.brown.senate.gov/news-

room/press/release/brown-proposal-protect-con-

sumers-privacy [https://perma.cc/74HR-KJLA]).

P, supra note 2. For a deep develop-

ment of the invisible minorities idea as a right to

avoid being stigmatized by certain unreasonable

inferences about groups, see Sandra Wachter &

Brent Mittelstadt, supra note 9. See also Gianclaudio

Malgieri & Jędrzej Niklas, Vulnerable Data Subjects,

C. L. & S. R. 37 (July 2020).

35 Data Accountability and Transparency Act

(DATA Act), S. 20719, 116th Cong. § 102(b)(4) (as

proposed to the Senate, 2020) [hereinaer DATA

Act]. The proposed act states that data aggregators

“shall not collect, use, or share, or cause to be col-

lected, used, or shared, any personal data unless

the aggregator can demonstrate that such personal

data is strictly necessary to carry out a permissible

purpose under section 102.” Id. at § 101. It also states

that “a data aggregator shall not … derive or infer

data from any element or set of personal data.”

36 Id. at § 102(a)(3). For European eorts to dene

a similar category, see Eur. Data Prot. Supervisor,

Preliminary Opinion on data protection and scien-

tic research, E. D. P. S (Jan.

6, 2020), https://edps.europa.eu/data-protection/

our-work/publications/opinions/preliminary-opin-

ion-data-protection-and-scientic_en [https://per-

ma.cc/4T3H-7W9W].

37 Chad Terhune, They Know What’s in Your Med-

21LICENSURE AS DATA GOVERNANCE

icine Cabinet, B B (July 23,

2008, 12:00 AM), https://www.bloomberg.com/

news/articles/2008-07-22/they-know-whats-in-your-

medicine-cabinet [https://perma.cc/3ZY4-ZVMX].

Of course, the guaranteed issue provisions and ban

on preexisting condition limitations in the 2010 Af-

fordable Care Act (ACA) made such practices much

less menacing to most consumers. However, the

ACA could easily be repealed, or declared null and

void by an activist Supreme Court. The rise of au-

thoritarianism in the U.S. should further caution us

to understand that no such rights (except of course

those of the party in power and its allies) are perma-

nently entrenched.

38 The proposed DATA Act’s “Prohibition On

Discriminatory Use of Personal Data” is a method

for shaping data collection, analysis, and use in a

democratically accountable and forward-thinking

way. DATA Act § 104. (“It is unlawful for a data ag-

gregator to collect, use, or share personal data for …

commercially contracting for housing, employment,

credit, or insurance in a manner that discriminates

against or otherwise makes the opportunity unavail-

able or ordered on dierent terms on the basis of

a protected class.”). As dened by the DATA Act,

“protected class” includes classications based

on “biometric information,” which would cover

hand-motion monitoring (and many other, more

remote forms of data collection and classicatory

inference). “Protected class” is dened as “actual

or perceived race, color, ethnicity, national origin,

religion, sex, gender, gender identity, sexual orien-

tation, familial status, biometric information, lawful

source of income, or disability of an individual or

group of individuals.” DATA Act § 3(20).

39 For an example of other such potential ex-

cessive uses, see Robert Pear, On Disability and on

Facebook? Uncle Sam Wants to Watch What You Post,

N.Y. T, (Mar. 10, 2019), https://www.nytimes.

com/2019/03/10/us/politics/social-security-disabil-

ity-trump-facebook.html [https://perma.cc/7FZ4-

MFLK].

40 These rights claims will be particularly salient

in the U.S., whose courts have expanded the scope of

the First Amendment to cover many types of activity

that would not merit free expression elsewhere, or

would merit much less intense free expression pro-

tection, given the importance of competing rights to

privacy, security, and data protection. On the general

issue of data’s categorization as speech, see Jack M.

Balkin, Information Fiduciaries and the First Amend-

ment, 49 U.C. D L. R. 1183 (2016); Jane Bambau-

er, Is Data Speech?, 66 S. L. R. 57 (2014); Paul

M. Schwartz, Free Speech vs. Information Privacy:

Eugene Volokh’s First Amendment Jurisprudence, 52

S. L. R. 1559 (2000); James M. Hilmert, The

Supreme Court Takes on the First Amendment Privacy

Conict and Stumbles: Bartnicki v. Vopper, the Wire-

tapping Act, and the Notion of Unlawfully Obtained

Information, 77 I. L. J. 639 (2002); Eric B. Easton,

Ten Years Aer: Bartnicki v. Vopper as a Laboratory

for First Amendment Advocacy and Analysis, 50 U.

L L. R. 287 (2011).

41 Johanna Gunawan et al., The COVID-19 Pandem-

ic and the Technology Trust Gap, 51 S H L.

R. 1505 (2021).

ACLU v. Clearview AI, Case 20 CH 4353, (Ill. Cir.,

Aug. 27, 2021), at 10 (“BIPA’s speaker-based exemp-

tions do not appear to favor any particular view-

point. As BIPA’s restrictions are content neutral, the

Court nds that intermediate scrutiny is the proper

standard.”).

Joint investigation of Clearview AI, Inc. by the

Oce of the Privacy Commissioner of Canada, the

Commission d’accès à l’information du Québec, the

Information and Privacy Commissioner for British

Columbia, and the Information Privacy Commission

er of Alberta, PIPEDA Findings #2021-001, para. 67,

https://www.priv.gc.ca/en/opc-actions-and-de-

cisions/investigations/investigations-into-busi-

nesses/2021/pipeda-2021-001/ [https://perma.cc/

XN8W-LKV8] (“Clearview has neither explained

nor demonstrated how its activities constitute the

expression of a message relating to the pursuit of

truth, participation in the community or individual

self-fulllment and human ourishing.”).

44 M A, A M (2020).

45 Id. at 72.

P E  ., H R A

L I M: T S C  C W

R (2013); S.M. Amadae, Game Theory,

Cheap Talk and Post-Truth Politics: David Lewis vs.

John Searle on reasons for truth-telling, 48 J. T

22 KNIGHT FIRST AMENDMENT INSTITUTE

S. B. 306 (2018).

47 Os Keyes, Counting the Countless, R L

(Apr. 8, 2019), https://reallifemag.com/count-

ing-the-countless/ [https://perma.cc/7M9J-4XFK].

48 Note that the DATA Act has an exception for “de

minimis” collection, analysis, and use: “Any person

that collects, uses, or shares an amount of personal

data that is not de minimis; and does not include

an individual who collects, uses, or shares personal

data solely for personal reasons.” DATA Act, § 3(8)

(A)-(B). The “large-scale” proviso of the licensure

regime proposed in this work is also meant to shield

smaller players, but on a larger scale.

49 For a broader argument on the limits of First

Amendment protection for operational code, see

David Golumbia, Code is Not Speech (Apr. 13, 2016)

(unpublished dra) (https://papers.ssrn.com/sol3/

papers.cfm?abstract_id=2764214 [https://perma.cc/

G8UG-XMAQ]).

For an analysis of the analogy between many

forms of big data processing and experiments that

are clearly deemed human subjects research, see

James Grimmelmann, The Law and Ethics of Exper-

iments on Social Media Users, 13 C. T. L. J.

219 (2015), https://ctlj.colorado.edu/wp-content/

uploads/2015/08/Grimmelman-nal.pdf [https://

perma.cc/7N6K-JJHG].

On policy rationales for limiting automated bot

speech, see Frank Pasquale, Preventing a Posthuman

Law of Freedom of Expression, in T P P-

 S (David Pozen ed., 2020).

52 U.S. Fair Credit Reporting Act (FCRA) § 609, 15

U.S.C. § 1681(g) (2011).

The FCRA provides further language limiting

what information may by contained in a consumer

report. 15 U.S.C. 1681(c) (2011). Consumer reports

cannot contain: Title 11 cases over 10 years old; civil

suits, judgments, or arrest records over seven years

old; paid tax liens over seven years old; accounts

placed for collection or charged to prot and loss

over seven years old; or any other adverse informa-

tion, other than criminal convictions, over seven

years old. These restrictions have not been success-

fully challenged as content-based restrictions under

the First Amendment.

A creditor is dened by the Equal Credit Oppor

tunity Act as those who “extend, renew, or continue

credit.” 15 U.S.C. § 1691(a)(e) (2010).

55 15 U.S.C. § 1691(a).

56 In New York, legislation was passed that bans

consumer reporting agencies and lenders from using

a consumer’s social network to determine creditwor-

thiness. The bill specically bans companies from

using the credit scores of people in an individual’s

social network as a variable in determining their

credit score. Keshia Clukey, Social Networks Can’t Go

Into Credit Decisions Under N.Y. Ban, B L.

(Nov. 25, 2019, 5:13 PM), https://news.bloomberglaw.

com/banking-law/social-networks-cant-go-into-

credit-decisions-under-n-y-ban[https://perma.cc/

LLA7-FMXB].

57 N J, F P, A -

 C  C R

S (2007).

58 Id. The same restriction applies in the U.S. “A

consumer reporting agency shall not furnish … a

consumer report that contains medical information

(other than medical contact information treated in

the manner required under section 1681(c)(a)(6) of

this title) about a consumer, unless—the consumer

armatively consents, … if furnished for employ-

ment purposes, … the information is relevant to the

process or eect[s] the employment or credit trans-

action, … the information to be furnished pertains

solely to transactions, accounts, or balances relating

to debts arising from the receipt of medical services,

products, or devises, … a creditor shall not obtain or

use medical information … in connection with any

determination of the consumer’s eligibility, or con-

tinued eligibility, for credit.” Fair Credit Reporting

Act, 15 U.S.C. § 1681(b)(g) (2020).

59 State Laws Limiting Use of Credit Informa-

tion for Employment, M (2017), https://

www.microbilt.com/Cms_Data/Contents/Mi-

crobilt/Media/Docs/MicroBilt-State-Laws-Limit-

ing-Use-of-Credit-Information-For-Employment-Ver-

sion-1-1-03-01-17-.pdf [https://perma.cc/9LSS-GLXZ].

60 Id.

61 A F A: AB-22 E-

:  , http://leginfo.legisla-

ture.ca.gov/faces/billAnalysisClient.xhtml?bill_

id=201120120AB22 [https://perma.cc/XPQ9-QM8U]

23LICENSURE AS DATA GOVERNANCE

(last visited May 13, 2021). Groups include unem-

ployed people, low-income communities, commu-

nities of color, women, domestic violence survivors,

families with children, divorced individuals, and

those with student loans and/or medical bills. N.Y.C.

Comm’n on Hum. Rts., Stop Credit Discrimination in

Employment Act: Legal Enforcement Guidance (N.Y.C.

Comm’n on Hum. Rts. 2015), https://www1.nyc.gov/

site/cchr/law/stop-credit-discrimination-employ-

ment-act.page [https://perma.cc/4TSX-8ECH].

McKenna Moore, Biden wants to change how

credit scores work in America, F (Dec. 18,

2020, 11:27 AM), https://fortune.com/2020/12/18/

biden-public-credit-agency-economic-jus-

tice-personal-nance-racism-credit-scores-equi-

fax-transuion-experian-cfpb/ [https://perma.cc/

5Y5P-9NLS]; Amy Traub, Establish a Public Credit

Registry, D (Apr. 3, 2019), https://www.demos.

org/policy-briefs/establish-public-credit-registry

[https://perma.cc/R998-DA6G]; T B P

 I I O C T

H, https://joebiden.com/housing/ (last vis-

ited July 12, 2021).

63 Frank Pasquale, From Territorial to Functional

Sovereignty: The Case of Amazon, LPE P (Dec.

6, 2017), https://lpeproject.org/blog/from-territori-

al-to-functional-sovereignty-the-case-of-amazon/

[https://perma.cc/52YQ-5RBK].

64 Proposal for a Regulation of the European Par-

liament and of the Council on a Single Market For

Digital Services (Digital Services Act), at 3, COM

(2020) 825 nal (Dec. 15, 2020) (“The operational

threshold for service providers in scope of these

obligations includes those online platforms with a

signicant reach in the Union, currently estimated

to be amounting to more than 45 million recipients

of the service. This threshold is proportionate to the

risks brought by the reach of the platforms in the

Union; where the Union’s population changes by a

certain percentage, the Commission will adjust the

number of recipients considered for the threshold,

so that it consistently corresponds to 10% of the

Union’s population.”); Id. at 31 (“Such signicant

reach should be considered to exist where the num-

ber of recipients exceeds an operational threshold

set at 45 million, that is, a number equivalent to 10%

of the Union population. The operational threshold

should be kept up to date through amendments en-

acted by delegated acts, where necessary.”). Such

thresholds reect a risk-focused model of regulation

commended by the German Data Ethics Commis-

sion. Data Ethics Comm’n, Fed. Gov’t Ger., Opinion

of the Data Ethics Commission (2019), 177.

65 Proposal for a Regulation of the European Par-

liament and of the Council on contestable and fair

markets in the digital sector (Digital Markets Act),

at 36–37, COM (2020) 842 nal (Dec. 15, 2020) (“A

provider of core platform services shall be presumed

[an important gateway for business users to reach

end users] where it provides a core platform service

that has more than 45 million monthly active end

users established or located in the Union and more

than 10,000 yearly active business users established

in the Union in the last nancial year.”).

Cal. Civ. Code § 1798.140(c)(1)(B) (West 2020)

(covering any business that “[a]lone or in combi

nation, annually buys, receives for the business’s

commercial purposes, sells, or shares for commer-

cial purposes, alone or in combination, the personal

information of 50,000 or more consumers, house-

holds, or devices”).

67 See, e.g., 16 C.F.R. § 318.5(b)–(c) (“A vendor of

personal health records or PHR related entity shall

provide notice to prominent media outlets serving

a State or jurisdiction, following the discovery of a

breach of security, if the unsecured PHR identiable

health information of 500 or more residents of such

State or jurisdiction is, or is reasonably believed to

have been, acquired during such breach.”); S-

 B N L, https://www.ncsl.

org/research/telecommunications-and-informa-

tion-technology/security-breach-notication-laws.

aspx [https://perma.cc/BS39-J2RE] (last visited May

13, 2021) (36 states set notication thresholds at 500

or 1,000).

Frank Pasquale, Grand Bargains for Big Data:

The Emerging Law of Health Information, 72 M. L.

R. 682 (2013); Frank Pasquale, Redescribing Health

Privacy: The Importance of Information Policy, 14

H. J. H L. & P’ 95 (2014).

69 To provide the proper level of resources, the

“self-funding agency” model is useful. Certain -

nancial and medical regulators are funded in part

via fees paid by regulated entities that must apply to

24 KNIGHT FIRST AMENDMENT INSTITUTE

engage in certain activities. For example, fees paid

pursuant to the Prescription Drug User Fee Act (PD-

UFA) fund the Food and Drug Administration (which

essentially licenses drugs for sale in the U.S.). For

background on this act and its amendments, see

P D U F A,

https://www.fda.gov/industry/fda-user-fee-pro-

grams/prescription-drug-user-fee-amendments

[https://perma.cc/5THX-NTKD] (last updated Aug.

25, 2021).

About the Author

F P is a professor of law at Brooklyn Law School, an aliate

fellow at the Yale Information Society Project, and the Minderoo High Impact

Distinguished Fellow at the AI Now Institute. He is also the chairman of the

Subcommittee on Privacy, Condentiality, and Security of the National Com-

mittee on Vital and Health Statistics at the U.S. Department of Health and

Human Services. Pasquale is an expert on the law of articial intelligence,

algorithms, and machine learning, and author of New Laws of Robotics:

Defending Human Expertise in the Age of AI (Harvard University Press, 2020).

His widely cited book, The Black Box Society (Harvard University Press, 2015),

develops a social theory of reputation, search, and nance, and promotes

pragmatic reforms to improve the information economy, including more

vigorous enforcement of competition and consumer protection law. The Black

Box Society has been reviewed in Science and Nature, published in several

languages, and its h anniversary of publication has been marked with an

international symposium in Big Data & Society.

Acknowledgments

I wish to thank David Baloche, Jameel Jaer, Margot Kaminski, Amy Kap-

czynski, Gianclaudio Malgieri, Ra Martina, Paul Ohm, Paul Schwartz, and

Ari Ezra Waldman for very helpful comments on this work. I, of course, take

responsibility for any faults in it. I also thank the Knight First Amendment

Institute and the Law and Political Economy Project for the opportunity to

be in dialogue on these critical issues.

26 KNIGHT FIRST AMENDMENT INSTITUTE

About the Knight First Amendment Institute

The Knight First Amendment Institute at Columbia University defends the

freedoms of speech and the press in the digital age through strategic litiga-

tion, research, and public education. It promotes a system of free expression

that is open and inclusive, that broadens and elevates public discourse,

and that fosters creativity, accountability, and eective self-government.

knightcolumbia.org

Design: Point Five