Licensure as Data
Governance
By Frank Pasquale
DATA AND DEMOCRACY
In October 2020, the Knight First Amendment Institute at Columbia Uni-
versity convened a virtual symposium, titled “Data and Democracy,” to
investigate how technological advances relating to the collection, anal-
ysis, and manipulation of data are aecting democratic processes, and
how the law must adapt to ensure the conditions for self-government. This
symposium was organized by the Institute’s 2019-2020 Senior Visiting
Research Scholar, Yale Law Professor Amy Kapczynski, and co-sponsored
by the Law and Political Economy Project at Yale Law School.
The essays in this series were originally presented and discussed at this
two-day event. Written by scholars and experts in law, computer science,
information studies, political science, and other disciplines, the essays
focus on three areas that are both central to democratic governance and
directly aected by advancing technologies and ever-increasing data
collection: 1) public opinion formation and access to information; 2) the
formation and exercise of public power; and 3) the political economy of
data.
The symposium was conceptualized by Knight Institute sta, including
Jameel Jaer, Executive Director; Katy Glenn Bass, Research Director;
Amy Kapczynski, Senior Visiting Research Scholar; Alex Abdo, Litigation
Director; and Larry Siems, Chief of Sta. The essay series was edited by
Glenn Bass with additional support from Lorraine Kenny, Communi-
cations Director; A. Adam Glenn, Writer/Editor; and Madeline Wood,
Research Coordinator.
The full series is available at knightcolumbia.org/research/
2 KNIGHT FIRST AMENDMENT INSTITUTE
[In the late 1950s, the U.S.] government abdicated … responsibility to
establish rules, safeguards, and standards relating to the collection
and use of personal data for the purpose of directing human behavior.
Plainly, all of this might have gone dierently. Plenty of people believed
at the time that a people machine was entirely and utterly amoral. “My
own opinion is that such a thing (a) cannot work, (b) is immoral, (c)
should be declared illegal,” [soon-to-be-FCC Chair] Newton Minow
had written to Arthur Schlesinger in 1959. “Please advise.”
Jill Lepore, If Then: How the Simulmatics Corporation Invented the
Future, 323.
3
INTRODUCTION
D
     face a crisis
of overwork and underresourcing. Enforcement of privacy laws is
too oen belated, if it comes at all. Massive rms with myriad data
points on tens of millions of people face nes for data misuse and security
breaches that are the economic equivalent of a parking ticket. Potentially
worse than all these well-recognized barriers to accountability is a known
unknown: namely, the black box problem. Even the most diligent regulators
and civil society groups have little idea of the full scope and intensity of
data extraction, analysis, and use at leading rms, given the triple barriers
of trade secrecy, nondisclosure agreements, and technical complexity now
eectively hiding their actions from public scrutiny. This crisis is likely to
continue unless there is a fundamental shi in the way we regulate the col-
lection, analysis, transfer, and use of data.
1
At present, policymakers tend to presume that the data practices of
rms are legal, and only investigate and regulate when there is suspicion of
wrongdoing. What if the presumption were ipped? That is, what if a rm
had to certify that its data practices met clear requirements for security,
nondiscrimination, accuracy, appropriateness, and correctability, before it
4 KNIGHT FIRST AMENDMENT INSTITUTE
collected, analyzed, or used data?
2
Such a standard may not seem admin-
istrable now, given the widespread and rapid use of data—and the articial
intelligence (AI) it powers—at rms of all sizes. But such requirements could
be applied, at rst, to the largest rms’ most troubling data practices, and
only gradually (if at all) to smaller ones and less menacing data practices.
For example, would it really be troubling to require rms to demonstrate
basic security practices once they have accumulated sensitive data on over
1 million people, before they continue to collect even more? Scholars have
argued that certain data practices should not be permitted at all.
3
Rather
than expecting underfunded, understaed regulators to overcome the mon-
umental administrative and black box problems mentioned above, responsi-
bility could be built into the structure of data-driven industries via licensure
schemes that require certain standards to be met before large-scale data
practices expand even further.
4
To give a concrete example motivating this ipped presumption about
data practices, consider the emergence of health inferences from data that
is not, on its face, health-predictive. For instance, an AI program, reviewing
only writing samples, “predicted, with 75 percent accuracy, who would get
Alzheimer’s disease.”
5
This type of inference could be used in subtle or secre-
tive ways by the rms making it, as well as by employers, marketers, nancial
institutions, and other important decision-makers.
6
Such predictions may
have massive impacts on those projected to have Alzheimer’s, including
denial of life insurance or long-term care insurance, denial of employment,
or loss of other opportunities. Even where such uses of the data are illegal,
complex and expensive legal systems may make it very dicult to enforce
one’s rights. Governments should ensure ex ante that predictions are done
and used in a responsible way, much as federally funded research is oen
channeled through institutional review boards in order to respect ethical
and legal standards.
7
A licensure regime for data and the AI it powers would enable citizens
to democratically shape data’s scope and proper use, rather than resigning
ourselves to being increasingly inuenced and shaped by forces beyond our
control. To ground the case for more ex ante regulation, Part I describes the
expanding scope of data collection, analysis, and use, and the threats that
that scope poses to data subjects. Part II critiques consent-based models
5LICENSURE AS DATA GOVERNANCE
of data protection, while Part III examines the substantive foundation of
licensure models. Part IV addresses a key challenge to my approach: the
free expression concerns raised by the licensure of large-scale personal
data collection, analysis, and use. Part V concludes with reections on the
opportunities created by data licensure frameworks and potential limitations
upon them.
I. THE EXPANDING SCOPE OF DATA
COLLECTION, ANALYSIS, AND USE
A
    more prevalent, massive rms
are privy to exceptionally comprehensive and intimate details about
individuals.
8
These can include transport, nancial, retail, health,
leisure, entertainment, location, and many other kinds of data. Once large
enough stores of data are created, there are increasing opportunities to cre-
ate inferences about persons based on extrapolations from both humanly
recognizable and ad hoc, machine learning-recognizable groups.
9
Much observation can occur without persons’ consent. Even when con
-
sent is obtained, improper data collection may occur. Increasingly desperate
individuals may be eectively coerced via their circumstances to permit
comprehensive, 360-degree surveillance of key aspects of their lives. For
example, many people now cannot access credit because of a “thin credit
le”—industry lingo for someone without much of a repayment history.
Fintech rms promise “nancial inclusion” for those willing to give lenders
access to their social media activity and other intimate data.
10
Critics char-
acterize this “opportunity” as predatory inclusion, and it is easy to see why:
Those enticed into the sphere of payday lenders may not only contract into
unsustainable debt-repayment schemes but may also become marks for
other exploitative businesses, such as for-prot universities or unlicensed
rehab centers.
The economic logic here (giving up more privacy in exchange for lower
interest rates or other favorable terms) is, in principle, illimitable. For exam
-
ple, a rm may give a loan applicant a reduced rate on a car loan, if she
allows it (and its agents) to download and then analyze all information on
6 KNIGHT FIRST AMENDMENT INSTITUTE
her mobile phone during the term of the loan, and to resell the data. Such
borrowers may never even know what is done with their data, thanks to
all-pervasive trade secrecy in the industry. Tracker apps on cell phones may
allow rms to record their employees’ location at all times. According to one
aggrieved worker, her boss “bragged that he knew how fast she was driving
at specic moments ever since she had installed [an] app on her phone.”
11
Even when such comprehensive surveillance is not consented to, AI
can operate as a “prediction machine,” analyzing data to make damaging
inferences about individuals.
12
These inferences may be entirely unexpected,
based on correlations that can only be found in vast troves of data.
13
For
example, a person’s proclivity to be depressed may be related to the apps or
websites they visit (or how they use those apps and websites). The websites
themselves may keep such data, or it may be collected by third parties with
commercial or other relationships with the sites or apps.
Correlations based on how a person uses their phone or computer may
be entirely unexpected. For example, a high-ranking Catholic cleric in the
U.S. was recently reported to be a user of the gay dating site Grindr by jour-
nalists at the digital publication named The Pillar, based on computing data.
As journalist Molly Olmsted explained:
According to one privacy engineer who has worked on issues related to
location data, Pillar (or the group that had oered CNA [the Catholic News
Agency] the data back in 2018) probably purchased a data set from a data
broker, which in turn had likely purchased the data from a third-party ad
network that Grindr uses. Grindr itself would not have been the source of
the data, but the ad network would have been given full access to the users’
information as long as they agreed to Grindr’s terms of services. (In 2018,
Grindr, which uses highly granular location information, was found to have
shared users’ anonymized locations, race, sexual preferences, and even HIV
status with third-party analytic rms.)
14
Whatever your views about the cleric in this case, the generalized exposure
of a person’s dating practices, or other intimate inferences based on loca-
tion-based data, is something a mature privacy law should prevent.
7LICENSURE AS DATA GOVERNANCE
Researchers have also analyzed certain activities of people who exten-
sively searched for information about Parkinson’s disease on Bing, including
their mouse movements six months before they entered those search terms.
15
Most users of the internet are probably unaware that not just what they
click on, but how fast and smoothly they move their mouse to do so, can be
recorded and traced by the sites they are using. The group of Bing users who
searched for Parkinson’s—which it is probably safe to assume is far more
likely to have Parkinson’s than the population as a whole—tended to have
certain tremors in their mouse movements distinct from other searchers.
These tremor patterns were undetectable by humans—only machine learning
could distinguish the group identied to have a higher propensity to have
Parkinson’s, based in part on microsecond-by-microsecond dierences in
speed and motion of hand movement.
There is at present no widely available defense to such detection—no
close-at-hand, privacy-enhancing strategy that can prevent such technol-
ogies of inference, once developed, from classifying someone as likely to
develop grave illness. Perhaps a clever technologist could develop tools to
“fuzz” tremor movements, or to smooth or normalize their transmission so
that they do not signal abnormality. But we should not expect internet users
to defend themselves against such invasive categorization by joining an arms
race of deection and rediscovery, obfuscation and clarication, encryption
and decryption.
16
It is a waste of our time. And those of higher socioeconomic
status have many more resources at hand to engage in such arms races, thus
adding the insult of rising inequality to the injury of privacy harm.
17
Moreover, a patchwork of weak and oen underenforced privacy laws is
no match for the threats posed by large-scale data processing, which can be
used to either overtly or secretly rank, rate, and evaluate people, oen to their
detriment and unfairly.
18
Without a society-wide commitment to fair data
practices, a troubling era of digital discrimination will be entrenched. The
more data about signals of future distress, illness, or disability are available,
the better AI will predict these conditions, enabling unscrupulous actors to
take advantage of them.
8 KNIGHT FIRST AMENDMENT INSTITUTE
II. THE IMPRACTICALITY OF CONSENT-BASED
MODELS IN AN AGE OF BIG DATA
M
    regulation begin with consent,
shiing responsibility to data subjects to decide to whom they
should grant their data, and to whom they should deny it. On the
consent-based view, it is up to the data subject to vet the reliability of enti-
ties seeking access to her data and monitor their ongoing abidance with the
promises they made when they obtained it. A consent-based model makes
sense as part of a contractarian and propertarian view of legal order: Data
are like objects, property of a person who may freely contract to give or share
the data with others.
19
On this view, just as a corporation needs consent to,
say, take a person’s property, it needs consent to take personal data. Once a
contract is struck, it governs the future exchange and use of the data.
This consent-based approach has multiple inrmities.
20
Much data
arises out of observation unrestricted by even theoretical contracts. To give
an example: A Google user may “consent” to data collection while using
the service, but no one is asked before the cameras on Google’s street-pho-
tographing cars roll down the road to grab data for its mapping service.
The rm does blot out faces when publishing the images to the internet,
but the principle remains: All manner of information can be garnered by
microphones, cameras, and sensors trained on the public at large, and
connected to particular persons, automobiles, or households via massive
facial recognition databases. Aerial surveillance can trace a person’s every
outdoor step and movement from home and back each day, as Baltimore’s
recent “spy plane” litigation has revealed.
21
Even for data that can be practically bound by contract, terms-of-service
agreements are continually subject to change. These changes almost always
favor service providers—despite the serious lock-in costs and reliance inter
-
ests of users. If one is dealing with a monopoly or must-have service, there is
no real choice but to accept their terms.
22
Even when choice is supercially
available, terms of service are by and large similar and imposed as a fait
accompli: The user either accepts them or does not get to use the service.
Sometimes, under exceedingly gentle pressure from regulators, rms will
magnanimously oer assurances about certain limits on the uses of data.
23
9LICENSURE AS DATA GOVERNANCE
However, data subjects are now under surveillance by so many rms that
it is impossible for them to audit such assurances comprehensively. How
can a person with a job and family to take care of try to gure out which of
thousands of data controllers has information about them, has correct infor-
mation, and has used it in a fair and rigorous manner? In the U.S., even the
diligent will all too oen run into the brick walls of trade secrecy, proprietary
business methods, and malign neglect if they do so much as ask about how
their data has been used, with whom it has been shared, and how it has been
analyzed.
24
Europeans may make “subject access requests,” but there are far
too many data gathering and data processing rms for the average person to
conduct reviews of their results in a comprehensive way.
The analogy between data and property also breaks down in an era of
digitization, when data can be so easily copied, transferred, stored, and
backed up.
25
Securing a house is a relatively easy matter compared with
securing one’s data. Even the most conscientious data subjects need only
slip once, failing to read a key clause in terms of service, or transacting
with an unreliable or insecure counterparty. Then critical, discrediting,
disadvantaging, or embarrassing data about them could end up copied and
recopied, populating countless databases. And even when the data subject
has clearly been wronged by a data controller or processor, the judiciary may
make litigation to obtain redress dicult or impossible.
26
As data and its analysis, sharing, and inferences proliferate, the consent
model becomes less and less realistic. There is simply too much for any indi-
vidual to keep track of. Nor have data unions risen to the challenge, given
the immense diculty of attaining any kind of bargaining leverage vis-à-
vis rst-party data collectors. There are myriad data gatherers, hundreds
of scoring entities,
27
and blacklists covering housing, voting, travel, and
employment.
28
Even if consent-based regimes are well-administered, they
can result in data gaps that impede, for instance, both medical research and
opportunities for clinical care. Forced into an environment where few adverse
uses of compromising data and inferences are forbidden (and where the
penalties for such uses are not sucient to deter wrongdoers), data subjects
easily forego valuable opportunities (such as future AI-enabled diagnoses)
or eschew low-risk chances to contribute to the public good by sharing data
(for research studies).
10 KNIGHT FIRST AMENDMENT INSTITUTE
Moreover, we cannot reasonably expect good administration of con-
sent-based regimes in many areas. In the U.S., regulatory agencies are ill-
suited to enforce others’ contracts. Even when they do bring cases on the
basis of deception claims, the penalties for failure to comply have frequently
been dismissed as a mere cost of doing business. First Amendment defenses
may also complicate any lawsuit predicated on an eort to stop the transfer
or analysis of data and inferences. In Europe, while data protection author-
ities are empowered by law to advance the interests of data subjects via the
General Data Protection Regulation (GDPR), they have in practice proven
reluctant to impose the types of penalties necessary to ensure adherence
to the law.
29
III. BEYOND CONSENT: THE EX ANTE
REGULATORY IMPERATIVE
O
    large-scale data processing that is more
responsive to the public interest is to ensure that proper scrutiny
occurs before the collection, analysis, and use of data. If enacted
via a licensure regime, this scrutiny would enable a true industrial policy for
big data, deterring misuses and thereby helping to channel AI development
in more socially useful directions.
30
As AI becomes more invasive and con-
tested, there will be increasing calls for licensure regimes. To be legislatively
viable, proposals for licensure need theoretical rigor and practical specicity.
What are the broad normative concerns motivating licensure? And what
types of uses should be permitted?
Cognizant of these queries, some legislators and regulators have begun
to develop an explicitly governance-driven approach to data.
31
While not
embracing licensure, Sen. Sherrod Brown of Ohio has demonstrated how
substantive limits may be enforced via licensure restrictions for large-scale
data collection, analysis, and use.
32
His Data Accountability and Transpar-
ency Act would amount to a Copernican shi in U.S. governance of data,
putting civil rights protection at the core of this approach to data regula-
tion.
33
This reects a deep concern about the dangers of discrimination
against minoritized or disadvantaged groups, as well as against the “invisible
11LICENSURE AS DATA GOVERNANCE
minorities” I have previously described in The Black Box Society.
34
For exam-
ple, the mouse microtremor example mentioned above may be prevented
by the Data Accountability and Transparency Act, which would forbid the
calculation of the inference itself by entities that intend to discriminate
based on it (or, more broadly, entities that have not demonstrated a per-
sonal or public health rationale for creating, disseminating, or using it).
35
On the other hand, the inference may be permissible as a way of conducting
“public or peer-reviewed scientic, historical, or statistical research in the
public interest, but only to the extent such research is not possible using
anonymized data.”
36
Thus, the generalizable nding may be made public,
but its harmful use against an individual would be precluded by preventing
a rm with no reasonable method of improving the person’s health from
making the inference. This avoids the “runaway data” problem I described
in Black Box Society, where data collection and analysis initially deemed
promising and helpful becomes a bane for individuals stigmatized by them.
Such assurances should enable more societal trust in vital data collec-
tion initiatives, like for health research, pandemic response, and data-driven
social reform. For a chilling example of a loss of trust in a situation without
such protections, we need only turn to the misuse of prescription databases
in the U.S. In the 2000s, patients in the U.S. were assured that large data-
bases of prescription drug use would be enormously helpful to them if they
ended up in an emergency room away from home, since emergency doctors
could have immediate access to this part of their medical record, and avoid
potentially dangerous drug interactions. However, that use of the database
was not immediately protable and did not become widespread. Rather, the
database became a favored information source of private insurers seeking
to deny coverage to individuals on the basis of “preexisting conditions.”
37
To avoid such future misuses and abuses of trust, we must develop ways of
preventing discriminatory uses of personal data, and of shaping the data
landscape generally, rather than continuing with a regime of post hoc, par-
tial, and belated regulation.
38
Sensitive to such misuses of data, ethicists have called for restrictions
on certain types of AI, with a presumption that it be banned unless licensed.
For example, it may be reasonable for states to develop highly specialized
databases of the faces of terrorists. But to deploy such powerful technology
12 KNIGHT FIRST AMENDMENT INSTITUTE
to ticket speeders or ferret out benets fraud is inappropriate, like using a
sledgehammer to kill a y.
39
A rational government would not license the
technology for such purposes, even if it would be entirely reasonable to do so
for other purposes (for example, to prevent pandemics via early detection of
infection clusters). Nor would it enable many of the forms of discrimination
and mischaracterization now enabled by light-to-nonexistent regulation of
large-scale data collection, analysis, and use.
The rst order of business for a reformed data economy is to ensure that
inaccurate, irresponsible, and damaging data collection, analysis, and use
are limited. Rather than assuming that data collection, processing, and use
are in general permitted, and that regulators must struggle to catch up and
outlaw particular bad acts, a licensure regime ips the presumption. Under
it, large-scale data collectors, brokers, and analysts would need to apply
for permission for their data collection, analysis, use, and transfer (at the
very least for new data practices, if older ones are “grandfathered” and thus
assumed to be licensed). To that end, a stricter version of the Data Account-
ability and Transparency Act might eventually insist that data brokers obtain
a license from the government in order to engage in the collection, sale,
analysis, and use of data about identiable people.
IV. FREE EXPRESSION CONCERNS RAISED BY THE
LICENSURE OF LARGE-SCALE PERSONAL DATA
COLLECTION, ANALYSIS, AND USE
W
    or the AI it powers, licensure
regimes will face challenges based on free expression rights.
40
The ironies here are manifold. The classic scientic process is
open, inviting a community of inquirers to build on one another’s works;
meanwhile, the leading corporate data hoarders most likely to be covered
by the licensing regime proposed here are masters of trade secrecy, aggres-
sively blocking transparency measures. Moreover, it is now clear that the
corporate assertion of such alleged constitutional rights results in databases
that chill speech and online participation.
41
It is one thing to go to a protest
when security personnel watch from afar. It is quite another when the police
13LICENSURE AS DATA GOVERNANCE
can immediately access your name, address, and job from a quick face scan
purchased from an unaccountable private rm.
This may be one reason why the American Civil Liberties Union deci-
sively supported the regulation of Clearview AI (a rm providing facial recog-
nition services) under the Illinois Biometric Information Privacy Act (BIPA),
despite Clearview’s insistence (to courts and the public at large) that it has
a First Amendment right to gather and analyze data unimpeded by BIPA. If
unregulated, the rm’s activities seem far more likely to undermine a robust
public sphere than to promote it. Moreover, even if its data processing were
granted free expression protections, such protections may be limited by
“time, place, and manner” restrictions. In that way, the licensure regime I
am proposing is much like permit requirements for parades, which recognize
the need to balance the parade organizers’ and marchers’ free expression
rights against the public need for safe and orderly streets. Given the privacy
and security concerns raised by mass data collection, analysis, and use,
restrictions on data practices thus may be subject to only intermediate scru-
tiny in the U.S.
42
Even more sensible is the Canadian rejection of the data
aggregator’s free expression claim tout court.
43
When an out-of-control data gathering industry’s handiwork can be
appropriated by both government and business decision-makers, data and
inferences reect both knowledge and power: They are descriptions of the
world that also result in actions done within it. They blur the boundary
between speech and conduct, observation and action, in ways that law can
no longer ignore. Mass data processing is unlike the ordinary language (or
“natural language”) traditionally protected by free expression protections.
Natural language is a verbal system of communication and meaning-mak-
ing. I can state something to a conversation partner and hope that my stated
(and perhaps some unstated) meanings are conveyed to that person. By
contrast, in computational systems, data are part of a project of “opera-
tional language”; their entry into the system produces immediate eects.
As Mark Andrejevic explains in Automated Media, there is no interpretive
gap in computer processing of information.
44
The algorithm fundamentally
depends on the binary (1 or 0), supplemented by the operators “and, or,
not.” In Andrejevic’s words, “machine ‘language’ … diers from human
language precisely because it is non-representational. For the machine, there
14 KNIGHT FIRST AMENDMENT INSTITUTE
is no space between sign and referent: there is no ‘lack’ in a language that is
complete unto itself. In this respect, machine language is ‘psychotic’ … [envi-
sioning] the perfection of social life through its obliteration.”
45
This method
of operation is so profoundly dierent than human language—or the other
forms of communication covered by free expression protections—that courts
should be exceptionally careful before extending powerful “rights to speak
to the corporate operators of computational systems that routinely abridge
human rights to privacy, data protection, and fair and accurate classication.
Unregulated AI is always at risk of distorting reality. Philosophers of
social science have explained the limits and constraints algorithmic pro-
cessing has imposed on social science models and research.
46
Scholars in
critical data studies have exposed the troubling binaries that have failed to
adequately, fairly, and humanely represent individuals. For example, Os
Keyes has called data science a “profound threat for queer people” because
of its imposition of gender binaries on those who wish to escape them (and
who seek societal acceptance of their own armation of their gender).
47
In
this light, it may well be the case that an entity should only process data
about data subjects’ gender (and much else) if it has been licensed to do so,
with licensure authorities fully cognizant of the concerns that Keyes and
other critical data scholars have raised.
The shi to thinking of large-scale data processing as a privilege, instead
of as a right, may seem jarring to American ears, given the expansion of First
Amendment coverage over the past century.
48
However, even in the U.S. it
is roundly conceded that there are certain particularly sensitive pieces of
“information” that cannot simply be collected and disseminated. A die-hard
cyberlibertarian or anarchist may want to copy and paste bank account
numbers or government identication numbers onto anonymous websites,
but that is illegal because complex sociotechnical systems like banks and the
Social Security Administration can only function on a predicate of privacy
and informational control.
49
We need to begin to do the same with respect
to facial recognition and other biometrics, and to expand this caution with
respect to other data that may be just as invasive and stigmatizing. Just as
there is regulation of federally funded human subjects research, similar
patterns of review and limitation must apply to the new forms of human
classication and manipulation now enabled by massive data collection.
50
15LICENSURE AS DATA GOVERNANCE
A licensure regime for big data analytics also puts some controls on the
speed and ubiquity of the correlations such systems can make. Just as we
may want to prevent automated bots from dominating forums like Twitter,
we can and should develop a societal consensus toward limiting the degree
to which automated correlations of oen biased, partial, and secret data
inuence our reputations and opportunities.
51
This commitment is already a robust part of nance regulation. For
example, when credit scores are calculated, the Fair Credit Reporting Act
imposes restrictions on the data that can aect them.
52
Far from being a for-
bidden content-based restriction on the “speech” of scoring, such restrictions
are vital to a fair credit system.
53
The Equal Credit Opportunity Act takes the
restrictions further regarding a creditor’s scoring system.
54
Such scoring
systems may not use certain characteristics—such as race, sex, gender,
marital status, national origin, religion, or receipt of public assistance—as
a factor regarding a customer’s creditworthiness.
55
Far from being a relic of
the activist 1970s, restrictions like this are part of contemporary eorts to
ensure a fairer credit system.
56
European examples abound as well. In Germany, the United Kingdom,
and France, agencies cannot use ethnic origin, political opinion, trade union
membership, or religious beliefs when calculating credit scores.
57
Germany
and the United Kingdom also prohibit the use of health data, while France
allows the use of health data in credit score calculations.
58
Such restrictions
might be implemented as part of a licensure regime for use of AI-driven
propensity scoring in many elds. For example, authorities may license
systems that credibly demonstrate to authorized testing and certication
bodies that they do not process data on forbidden grounds, while denying
a license to those that do.
Moreover, credit scores themselves feature as forbidden data in some
other determinations. For example, many U.S. states prevent them from
being used by employers.
59
California, Hawaii, and Massachusetts ban the
use of credit scoring for automobile insurance.
60
A broad coalition of civil
rights and workers’ rights groups reject these algorithmic assessments of
personal worth and trustworthiness.
61
The logical next step for such activ-
ism is to develop systems of evaluation that better respect human dignity
and social values in the construction of actionable reputations—those with
16 KNIGHT FIRST AMENDMENT INSTITUTE
direct and immediate impact on how we are classied, treated, and evalu-
ated. For example, many have called for the nationalization of at least some
credit scores.
62
Compared with that proposal, a licensure regime for such
algorithmic assessments of propensity to repay is moderate.
To be sure, there will be some dicult judgment calls to be made, as
in the case with any licensure regime. But size-based triggers can blunt the
impact of licensure regimes on expression, focusing restrictions on rms with
the most potential to cause harm. These rms are so powerful that they are
almost governmental in their own right.
63
The EU’s Digital Services Act pro-
posal, for example, includes obligations that would only apply to platforms
that reach 10 percent of the EU population (about 45 million people).
64
The
Digital Markets Act proposal includes obligations that would only apply to
rms that provide “a core platform service that has more than 45 million
monthly active end users established or located in the Union and more
than 10,000 yearly active business users established in the Union in the last
nancial year.”
65
In the U.S., the California Consumer Privacy Act applies to
companies that have data on 50,000 California residents.
66
Many U.S. laws
requiring security breach notications generally trigger at around 500-1,000
records breached.
67
In short, a nuanced licensing regime can be developed
that is primarily aimed at the riskiest collections of data, and only imposes
such obligations (or less rigorous ones) on smaller entities as the value and
administrability of requirements for larger rms is demonstrated.
V. CONCLUSION
O
   grand bargain for big data” I outlined in 2013,
followed by the “redescription of health privacy” I proposed in 2014,
is a reorientation of privacy and data protection advocacy.
68
The
state, its agencies, and the corporations they charter only deserve access
to more data about persons if they can demonstrate that they are actually
using that data to advance human welfare. Without proper assurances that
the abuse of data has been foreclosed, citizens should not accede to the
large-scale data grabs now underway.
17LICENSURE AS DATA GOVERNANCE
Not only ex post enforcement but also ex ante licensure is necessary
to ensure that data are only collected, analyzed, and used for permissible
purposes. This article has sketched the rst steps toward translating the
general normative construct of a “social license” for data use into a specic
licensure framework. Of course, more conceptual work remains to be done,
both substantively (elaborating grounds for denying a license) and practi-
cally (to estimate the resources needed to develop the rst iteration of the
licensing proposal).
69
The consent model has enjoyed the benets of such
conceptual work for decades; now it is time to devote similar intellectual
energy to a licensing model.
Ex ante licensure of large-scale data collection, analysis, use, and shar-
ing should become common in jurisdictions committed to enabling demo-
cratic governance of personal data. Dening permissible purposes for the
licensure of large-scale personal data collection, analysis, use, and sharing
will take up an increasing amount of time for regulators, and law enforcers
will need new tools to ensure that regulations are actually being followed.
The articulation and enforcement of these specications will prove an essen-
tial foundation of an emancipatory industrial policy for AI.
18 KNIGHT FIRST AMENDMENT INSTITUTE
NOTES
1 In this article, I collectively refer to the collec-
tion, analysis, transfer, and use of data as “data prac-
tices.” The analysis of one set of data can create a
new set of data, or inferences; for my purposes, all
this follow-on development of data and inferences
via analysis is included in the term analysis itself.
2 For earlier examples of this kind of move to sup-
plement ex post regulation with ex ante licensure,
see Saule T. Omarova, License to Deal: Mandatory
Approval of Complex Financial Products, 90 W.
U. L. R. 63, 63 (2012); Andrew Tutt, An FDA
for Algorithms, 69 A. L. R. 83 (2017); F
P, T B B S 181 (2015). The
Federal Communications Commission’s power
to license spectrum and devices is also a useful
precedent here—and one reason my epigraph for
this piece gestures to the work and views of one
of the most influential FCC commissioners in
U.S. history, Newton Minow. Like the airwaves,
big data may usefully be considered as a public
resource. Salome Viljoen, Democratic Data: A
Relational Theory For Data Governance, Y L.
J. (forthcoming, 2021).
3
Siddharth Venkataramakrishnan, Top research-
ers condemn ‘racially biased’ face-based crime predic-
tion, F. T (June 24, 2020), https://www..com/
content/aaa9e654-c962-46c7-8dd0-c2b4af932220
[https://perma.cc/AZU5-VLPD] (“More than 2,000
leading academics and researchers from institutions
including Google, MIT, Microso and Yale have
called on academic journals to halt the publication
of studies claiming to have used algorithms to predict
criminality. The nascent eld of AI-powered ‘criminal
recognition’ trains algorithms to recognise complex
patterns in the facial features of people categorised
by whether or not they have previously committed
crimes.”). For more on the problems of face-focused
prediction of criminality by AI, see Frank Pasquale,
When Machine Learning is Facially Invalid, 61 C-
’ ACM 25, 25 (Sept. 2018).
4 See also Daten ethik kommission [Data Ethics
Commission of the Federal Government of Germany],
Opinion of the Data Ethics Commission (2019), 195
(calling for “Preventive ocial licensing procedures
for high-risk algorithmic systems”). The DEC ob-
serves that, “[I]n the case of algorithmic systems with
regular or appreciable (Level 3) or even signicant
potential for harm (Level 4), in addition to existing
regulations, it would make sense to establish licens-
ing procedures or preliminary checks carried out by
supervisory institutions in order to prevent harm to
data subjects, certain sections of the population or
society as a whole.” Id. Such licensing could also be
promulgated by national authorities to enforce the
European Union’s proposed AI Act. Frank Pasquale
& Gianclaudio Malgieri, Here’s a Model for Reining in
AI’s Excesses, N.Y. T, Aug. 2, 2021, at A19.
5
Gina Kolata, The First Word on Predicting Alz-
heimer’s, N.Y. T, Feb. 2, 2021, at D3.
6 P, supra note 2, at 149; Frank Pasquale,
Promoting Data for Well-Being While Minimizing Stig-
ma: Foundations of Equitable AI Policy for Health Pre-
dictions, in D D (Martin Moore &
Damian Tambini, eds., Oxford University Press, 2021).
7 For more on this analogy between big data-driv-
en health prediction and human subjects research,
see James Grimmelmann, The Law and Ethics of Ex-
periments on Social Media Users, 13 C. T. L. J.
219 (2015); Frank Pasquale, Privacy, Autonomy, and
Internet Platforms, in P I T M A:
T S F S (Marc Rotenberg et al.
eds., 2015).
8 Theodore Rostow, What Happens When an Ac-
quaintance Buys Your Data?: A New Privacy Harm in
the Age of Data Brokers, 34 Y J.  R. 667 (2017).
9
Brent Mittelstadt, From Individual to Group Pri-
vacy in Biomedical Big Data, in B D, H
L,  B 175 (I. Glenn Cohen et al. eds.,
2018); Sandra Wachter & Brent Mittelstadt, A Right to
Reasonable Inferences: Re-Thinking Data Protection
Law in the Age of Big Data and AI, 2 C. B. L.
R. (2019).
10
Aaron Chou, What’s In The “Black Box”? Bal-
ancing Financial Inclusion and Privacy in Digital
Consumer Lending, 69 D L. J. 1183, 1192 (2020).
11 James Vincent, Woman red aer disabling
work app that tracked her movements 24/7, V
(May 13, 2015, 7:01 AM), https://www.theverge.
com/2015/5/13/8597081/worker-gps-red-myrna-
19LICENSURE AS DATA GOVERNANCE
arias-xora.
12 A A  ., P M:
T S E  A I-
 (2018).
13 Eric Horvitz & Deirdre Mulligan, Data, privacy,
and the greater good, S, July 17, 2015, at 253.
14 Molly Olmsted, A Prominent Priest Was Outed
for Using Grindr. Experts Say It’s a Warning Sign,
S (July 21, 2021, 7:03 PM), https://slate.com/
technology/2021/07/catholic-priest-grindr-data-pri-
vacy.html [https://perma.cc/DN5H-J8NA].
15 Ryen W. White et al., Detecting Neurogenerative
Disorders from Web Search Signals, 1 NPJ D. M.
1 (Apr. 23, 2018), https://www.nature.com/articles/
s41746-018-0016-6.pdf [https://perma.cc/9KDK-VB-
GZ]. In this case, the source of the information was
clear: Microso itself, which operates Bing, permit-
ted the researchers to study anonymized databases.
For an analysis of the import of such data in the U.S.,
where it is now well beyond the scope of the privacy
and security protections guaranteed pursuant to the
Health Insurance Portability and Accountability Act
(HIPAA) and the Health Information Technology for
Economic and Clinical Health (HITECH) Act), see
National Committee on Vital and Health Statistics,
Subcommittee on Privacy, Condentiality, and Se-
curity, Health Information Privacy Beyond HIPAA
(2019), at https://ncvhs.hhs.gov/wp-content/up-
loads/2019/07/Report-Framework-for-Health-Infor-
mation-Privacy.pdf [https://perma.cc/A5JW-QLW8].
16 As I elaborate in a recent book, one critical goal
of technology policy should be stopping such arms
races. F P, N L  R
(2020); see also Frank Pasquale, Paradoxes of Pri-
vacy in an Era of Asymmetrical Social Control, in B
D, C  S C (Aleš Završnick
ed., 2018) (on the encryption arms race). As Jathan
Sadowski has argued, “cyberhygiene” arguments all
too easily degenerate into victim-blaming. J
S, T S (2019).
17 Of course, in the right hands, the data could be
quite useful. We may want our doctors to access such
information, but we need not let banks, employ-
ers, or others use it. That is one foundation of the
licensing regime I will describe in Part III: ensuring
persons can generally presume data associated with
them is being used to advance their well-being, rath-
er than to stigmatize or exclude them.
18
For a powerful critique of extant privacy laws
in the U.S. and Europe, see J C, B
T  P (2019).
19
On the fallacies inherent in this model in the
context of health privacy, see Barbara J. Evans, Much
Ado About Data Ownership, 25 H. J. L. & T. 70
(2011).
20 Julie Cohen, Turning Privacy Inside Out, 20
T I L. 1, 1 (2019) (“[P]rivacy’s
most enduring institutional failure modes flow
from its insistence on placing the individual and
individualized control at the center.”); Bart Willem
Schermer et al., The Crisis of Consent: How Stronger
Legal Protection May Lead to Weaker Consent in Data
Protection, 16 E  I. T. 171 (2014);
Gabriela Zanr-Fortuna, Forgetting About Consent:
Why the Focus Should Be on ‘Suitable Safeguards’ in
Data Protection Law (May 10, 2013) (unpublished
working paper) (https://papers.ssrn.com/sol3/pa-
pers.cfm?abstract_id=2261973 [https://perma.cc/
DU9X-97ND]).
21 Leaders of a Beautiful Struggle v. Balt. Police
Dept, 1:20-cv-00929-RDB (4th Cir., June 24, 2021).
A sharply divided Fourth Circuit Court of Appeals
ruled the program unconstitutional.
22
Frank Pasquale, Privacy, Antitrust, and Pow-
er, 20 G. M L. R. 1009 (2013); Andreas
Mundt, Bundeskartellamt prohibits Facebook
from combining user data from dierent sources,
Bundeskartellamt (Feb. 7, 2019), https://www.
bundeskartellamt.de/SharedDocs/Meldung/EN/
Pressemitteilungen/2019/07_02_2019_Facebook.
html [https://perma.cc/X6US-RL7A] (“In view of
Facebook’s superior market power, an obligatory
tick on the box to agree to the company’s terms of
use is not an adequate basis for such intensive data
processing. The only choice the user has is either to
accept the comprehensive combination of data or to
refrain from using the social network. In such a dif-
cult situation the user’s choice cannot be referred
to as voluntary consent.”).
23 A E W, I U: T
I S  P, D,  C
P (2021).
20 KNIGHT FIRST AMENDMENT INSTITUTE
24
Even in the health care system, where access to
such information is supposed to be guaranteed by
federal health privacy laws, patients nd consider-
able barriers to the exercise of their rights.
25
For a sophisticated response to this problem,
see Václav Janeček & Gianclaudio Malgieri, Com-
merce in Data and the Dynamically Limited Alien-
ability Rule, 21 G. L. J. 924 (2020).
26
Daniel J. Solove & Danielle Keats Citron, Stand-
ing and Privacy Harms: A Critique of TransUnion v.
Ramirez, 101 B. U. L. R. O 62 (2021).
27 Pam Dixon & Bob Gellman, The Scoring
of America (World Policy Forum, Apr. 2, 2014),
http://www.worldprivacyforum.org/wp-con-
tent/uploads/2014/04/WPF_Scoring_of_Ameri-
ca_April2014_fs.pdf [https://perma.cc/GP6J-L75J];
Danielle Keats Citron & Frank Pasquale, The Scored
Society: Due Process for Automated Predictions, 89
W. L. R. 1, 31 (2014).
28 Margaret Hu, Big Data Blacklisting, 67 F. L.
R. 1735 (2016).
29 Adam Satariano, Europe’s Privacy Law Hasn’t
Shown Its Teeth, N.Y. T, Apr. 28, 2020, at B1.
30 In this way, my proposals here are an exten-
sion of the ideas I develop in Frank Pasquale, Data
Informed Duties for AI Development, 119 C. L.
R. 1917, 1917 (2019) (“Law should help direct—and
not merely constrain—the development of articial
intelligence (AI). One path to inuence is the devel-
opment of standards of care both supplemented and
informed by rigorous regulatory guidance.”).
31 For example, the European Data Protection
Board is exploring certication. Eur. Data Prot.
Bd., Guidelines 1/2018 on certication and identi-
fying certication criteria in accordance with Arti-
cles 42 and 43 of the Regulation, Version 3.0 (June
4, 2019), https://edpb.europa.eu/our-work-tools/
our-documents/guidelines/guidelines-12018-certi-
cation-and-identifying_en [https://perma.cc/Q365-
M99F] (“Before the adoption of the GDPR, the Arti-
cle 29 Working Party established that certication
could play an important role in the accountability
framework for data protection. In order for certica-
tion to provide reliable evidence of data protection
compliance, clear rules setting forth requirements
for the provision of certication should be in place.
Article 42 of the GDPR provides the legal basis for
the development of such rules.”).
32 In future work, I hope to compare Brown’s
proposal with the GDPR’s denition of “legitimate
purposes.” Under the GDPR, “Personal data shall
be … collected for specied, explicit and legitimate
purposes and not further processed in a manner
that is incompatible with those purposes.” Chris Jay
Hoofnagle et al., The European Union general data
protection regulation: what it is and what it means, 28
I. & C. T. L. 65, 77 n. 82 (2019) (quoting
Council Directive 2016/679 O.J. (L 119) art. 5(1)(b)
(General Data Protection Regulation)).
33 Press Release, Sherrod Brown U.S. Sen. for
Ohio, Brown Releases New Proposal That Would
Protect Consumers’ Privacy from Bad Actors (June
18, 2020), (https://www.brown.senate.gov/news-
room/press/release/brown-proposal-protect-con-
sumers-privacy [https://perma.cc/74HR-KJLA]).
34
P, supra note 2. For a deep develop-
ment of the invisible minorities idea as a right to
avoid being stigmatized by certain unreasonable
inferences about groups, see Sandra Wachter &
Brent Mittelstadt, supra note 9. See also Gianclaudio
Malgieri & Jędrzej Niklas, Vulnerable Data Subjects,
C. L. & S. R. 37 (July 2020).
35 Data Accountability and Transparency Act
(DATA Act), S. 20719, 116th Cong. § 102(b)(4) (as
proposed to the Senate, 2020) [hereinaer DATA
Act]. The proposed act states that data aggregators
“shall not collect, use, or share, or cause to be col-
lected, used, or shared, any personal data unless
the aggregator can demonstrate that such personal
data is strictly necessary to carry out a permissible
purpose under section 102.” Id. at § 101. It also states
that “a data aggregator shall not … derive or infer
data from any element or set of personal data.”
36 Id. at § 102(a)(3). For European eorts to dene
a similar category, see Eur. Data Prot. Supervisor,
Preliminary Opinion on data protection and scien-
tic research, E. D. P. S (Jan.
6, 2020), https://edps.europa.eu/data-protection/
our-work/publications/opinions/preliminary-opin-
ion-data-protection-and-scientic_en [https://per-
ma.cc/4T3H-7W9W].
37 Chad Terhune, They Know What’s in Your Med-
21LICENSURE AS DATA GOVERNANCE
icine Cabinet, B B (July 23,
2008, 12:00 AM), https://www.bloomberg.com/
news/articles/2008-07-22/they-know-whats-in-your-
medicine-cabinet [https://perma.cc/3ZY4-ZVMX].
Of course, the guaranteed issue provisions and ban
on preexisting condition limitations in the 2010 Af-
fordable Care Act (ACA) made such practices much
less menacing to most consumers. However, the
ACA could easily be repealed, or declared null and
void by an activist Supreme Court. The rise of au-
thoritarianism in the U.S. should further caution us
to understand that no such rights (except of course
those of the party in power and its allies) are perma-
nently entrenched.
38 The proposed DATA Act’s “Prohibition On
Discriminatory Use of Personal Data” is a method
for shaping data collection, analysis, and use in a
democratically accountable and forward-thinking
way. DATA Act § 104. (“It is unlawful for a data ag-
gregator to collect, use, or share personal data for …
commercially contracting for housing, employment,
credit, or insurance in a manner that discriminates
against or otherwise makes the opportunity unavail-
able or ordered on dierent terms on the basis of
a protected class.”). As dened by the DATA Act,
“protected class” includes classications based
on “biometric information,” which would cover
hand-motion monitoring (and many other, more
remote forms of data collection and classicatory
inference). “Protected class” is dened as “actual
or perceived race, color, ethnicity, national origin,
religion, sex, gender, gender identity, sexual orien-
tation, familial status, biometric information, lawful
source of income, or disability of an individual or
group of individuals.” DATA Act § 3(20).
39 For an example of other such potential ex-
cessive uses, see Robert Pear, On Disability and on
Facebook? Uncle Sam Wants to Watch What You Post,
N.Y. T, (Mar. 10, 2019), https://www.nytimes.
com/2019/03/10/us/politics/social-security-disabil-
ity-trump-facebook.html [https://perma.cc/7FZ4-
MFLK].
40 These rights claims will be particularly salient
in the U.S., whose courts have expanded the scope of
the First Amendment to cover many types of activity
that would not merit free expression elsewhere, or
would merit much less intense free expression pro-
tection, given the importance of competing rights to
privacy, security, and data protection. On the general
issue of data’s categorization as speech, see Jack M.
Balkin, Information Fiduciaries and the First Amend-
ment, 49 U.C. D L. R. 1183 (2016); Jane Bambau-
er, Is Data Speech?, 66 S. L. R. 57 (2014); Paul
M. Schwartz, Free Speech vs. Information Privacy:
Eugene Volokh’s First Amendment Jurisprudence, 52
S. L. R. 1559 (2000); James M. Hilmert, The
Supreme Court Takes on the First Amendment Privacy
Conict and Stumbles: Bartnicki v. Vopper, the Wire-
tapping Act, and the Notion of Unlawfully Obtained
Information, 77 I. L. J. 639 (2002); Eric B. Easton,
Ten Years Aer: Bartnicki v. Vopper as a Laboratory
for First Amendment Advocacy and Analysis, 50 U.
L L. R. 287 (2011).
41 Johanna Gunawan et al., The COVID-19 Pandem-
ic and the Technology Trust Gap, 51 S H L.
R. 1505 (2021).
42
ACLU v. Clearview AI, Case 20 CH 4353, (Ill. Cir.,
Aug. 27, 2021), at 10 (“BIPA’s speaker-based exemp-
tions do not appear to favor any particular view-
point. As BIPA’s restrictions are content neutral, the
Court nds that intermediate scrutiny is the proper
standard.”).
43
Joint investigation of Clearview AI, Inc. by the
Oce of the Privacy Commissioner of Canada, the
Commission d’accès à l’information du Québec, the
Information and Privacy Commissioner for British
Columbia, and the Information Privacy Commission
-
er of Alberta, PIPEDA Findings #2021-001, para. 67,
https://www.priv.gc.ca/en/opc-actions-and-de-
cisions/investigations/investigations-into-busi-
nesses/2021/pipeda-2021-001/ [https://perma.cc/
XN8W-LKV8] (“Clearview has neither explained
nor demonstrated how its activities constitute the
expression of a message relating to the pursuit of
truth, participation in the community or individual
self-fulllment and human ourishing.”).
44 M A, A M (2020).
45 Id. at 72.
46
P E  ., H R A
L I M: T S C  C W
R (2013); S.M. Amadae, Game Theory,
Cheap Talk and Post-Truth Politics: David Lewis vs.
John Searle on reasons for truth-telling, 48 J. T
22 KNIGHT FIRST AMENDMENT INSTITUTE
S. B. 306 (2018).
47 Os Keyes, Counting the Countless, R L
(Apr. 8, 2019), https://reallifemag.com/count-
ing-the-countless/ [https://perma.cc/7M9J-4XFK].
48 Note that the DATA Act has an exception for “de
minimis” collection, analysis, and use: “Any person
that collects, uses, or shares an amount of personal
data that is not de minimis; and does not include
an individual who collects, uses, or shares personal
data solely for personal reasons.” DATA Act, § 3(8)
(A)-(B). The “large-scale” proviso of the licensure
regime proposed in this work is also meant to shield
smaller players, but on a larger scale.
49 For a broader argument on the limits of First
Amendment protection for operational code, see
David Golumbia, Code is Not Speech (Apr. 13, 2016)
(unpublished dra) (https://papers.ssrn.com/sol3/
papers.cfm?abstract_id=2764214 [https://perma.cc/
G8UG-XMAQ]).
50
For an analysis of the analogy between many
forms of big data processing and experiments that
are clearly deemed human subjects research, see
James Grimmelmann, The Law and Ethics of Exper-
iments on Social Media Users, 13 C. T. L. J.
219 (2015), https://ctlj.colorado.edu/wp-content/
uploads/2015/08/Grimmelman-nal.pdf [https://
perma.cc/7N6K-JJHG].
51
On policy rationales for limiting automated bot
speech, see Frank Pasquale, Preventing a Posthuman
Law of Freedom of Expression, in T P P-
 S (David Pozen ed., 2020).
52 U.S. Fair Credit Reporting Act (FCRA) § 609, 15
U.S.C. § 1681(g) (2011).
53
The FCRA provides further language limiting
what information may by contained in a consumer
report. 15 U.S.C. 1681(c) (2011). Consumer reports
cannot contain: Title 11 cases over 10 years old; civil
suits, judgments, or arrest records over seven years
old; paid tax liens over seven years old; accounts
placed for collection or charged to prot and loss
over seven years old; or any other adverse informa-
tion, other than criminal convictions, over seven
years old. These restrictions have not been success-
fully challenged as content-based restrictions under
the First Amendment.
54
A creditor is dened by the Equal Credit Oppor
-
tunity Act as those who “extend, renew, or continue
credit.” 15 U.S.C. § 1691(a)(e) (2010).
55 15 U.S.C. § 1691(a).
56 In New York, legislation was passed that bans
consumer reporting agencies and lenders from using
a consumer’s social network to determine creditwor-
thiness. The bill specically bans companies from
using the credit scores of people in an individual’s
social network as a variable in determining their
credit score. Keshia Clukey, Social Networks Can’t Go
Into Credit Decisions Under N.Y. Ban, B L.
(Nov. 25, 2019, 5:13 PM), https://news.bloomberglaw.
com/banking-law/social-networks-cant-go-into-
credit-decisions-under-n-y-ban[https://perma.cc/
LLA7-FMXB].
57 N J, F P, A -
 C  C R
S (2007).
58 Id. The same restriction applies in the U.S. “A
consumer reporting agency shall not furnish … a
consumer report that contains medical information
(other than medical contact information treated in
the manner required under section 1681(c)(a)(6) of
this title) about a consumer, unless—the consumer
armatively consents, … if furnished for employ-
ment purposes, … the information is relevant to the
process or eect[s] the employment or credit trans-
action, … the information to be furnished pertains
solely to transactions, accounts, or balances relating
to debts arising from the receipt of medical services,
products, or devises, … a creditor shall not obtain or
use medical information … in connection with any
determination of the consumer’s eligibility, or con-
tinued eligibility, for credit.” Fair Credit Reporting
Act, 15 U.S.C. § 1681(b)(g) (2020).
59 State Laws Limiting Use of Credit Informa-
tion for Employment, M (2017), https://
www.microbilt.com/Cms_Data/Contents/Mi-
crobilt/Media/Docs/MicroBilt-State-Laws-Limit-
ing-Use-of-Credit-Information-For-Employment-Ver-
sion-1-1-03-01-17-.pdf [https://perma.cc/9LSS-GLXZ].
60 Id.
61 A F A: AB-22 E-
:  , http://leginfo.legisla-
ture.ca.gov/faces/billAnalysisClient.xhtml?bill_
id=201120120AB22 [https://perma.cc/XPQ9-QM8U]
23LICENSURE AS DATA GOVERNANCE
(last visited May 13, 2021). Groups include unem-
ployed people, low-income communities, commu-
nities of color, women, domestic violence survivors,
families with children, divorced individuals, and
those with student loans and/or medical bills. N.Y.C.
Commn on Hum. Rts., Stop Credit Discrimination in
Employment Act: Legal Enforcement Guidance (N.Y.C.
Commn on Hum. Rts. 2015), https://www1.nyc.gov/
site/cchr/law/stop-credit-discrimination-employ-
ment-act.page [https://perma.cc/4TSX-8ECH].
62
McKenna Moore, Biden wants to change how
credit scores work in America, F (Dec. 18,
2020, 11:27 AM), https://fortune.com/2020/12/18/
biden-public-credit-agency-economic-jus-
tice-personal-nance-racism-credit-scores-equi-
fax-transuion-experian-cfpb/ [https://perma.cc/
5Y5P-9NLS]; Amy Traub, Establish a Public Credit
Registry, D (Apr. 3, 2019), https://www.demos.
org/policy-briefs/establish-public-credit-registry
[https://perma.cc/R998-DA6G]; T B P
 I I O C T
H, https://joebiden.com/housing/ (last vis-
ited July 12, 2021).
63 Frank Pasquale, From Territorial to Functional
Sovereignty: The Case of Amazon, LPE P (Dec.
6, 2017), https://lpeproject.org/blog/from-territori-
al-to-functional-sovereignty-the-case-of-amazon/
[https://perma.cc/52YQ-5RBK].
64 Proposal for a Regulation of the European Par-
liament and of the Council on a Single Market For
Digital Services (Digital Services Act), at 3, COM
(2020) 825 nal (Dec. 15, 2020) (“The operational
threshold for service providers in scope of these
obligations includes those online platforms with a
signicant reach in the Union, currently estimated
to be amounting to more than 45 million recipients
of the service. This threshold is proportionate to the
risks brought by the reach of the platforms in the
Union; where the Union’s population changes by a
certain percentage, the Commission will adjust the
number of recipients considered for the threshold,
so that it consistently corresponds to 10% of the
Union’s population.”); Id. at 31 (“Such signicant
reach should be considered to exist where the num-
ber of recipients exceeds an operational threshold
set at 45 million, that is, a number equivalent to 10%
of the Union population. The operational threshold
should be kept up to date through amendments en-
acted by delegated acts, where necessary.”). Such
thresholds reect a risk-focused model of regulation
commended by the German Data Ethics Commis-
sion. Data Ethics Commn, Fed. Gov’t Ger., Opinion
of the Data Ethics Commission (2019), 177.
65 Proposal for a Regulation of the European Par-
liament and of the Council on contestable and fair
markets in the digital sector (Digital Markets Act),
at 36–37, COM (2020) 842 nal (Dec. 15, 2020) (“A
provider of core platform services shall be presumed
[an important gateway for business users to reach
end users] where it provides a core platform service
that has more than 45 million monthly active end
users established or located in the Union and more
than 10,000 yearly active business users established
in the Union in the last nancial year.”).
66
Cal. Civ. Code § 1798.140(c)(1)(B) (West 2020)
(covering any business that “[a]lone or in combi
-
nation, annually buys, receives for the business’s
commercial purposes, sells, or shares for commer-
cial purposes, alone or in combination, the personal
information of 50,000 or more consumers, house-
holds, or devices”).
67 See, e.g., 16 C.F.R. § 318.5(b)–(c) (“A vendor of
personal health records or PHR related entity shall
provide notice to prominent media outlets serving
a State or jurisdiction, following the discovery of a
breach of security, if the unsecured PHR identiable
health information of 500 or more residents of such
State or jurisdiction is, or is reasonably believed to
have been, acquired during such breach.”); S-
 B N L, https://www.ncsl.
org/research/telecommunications-and-informa-
tion-technology/security-breach-notication-laws.
aspx [https://perma.cc/BS39-J2RE] (last visited May
13, 2021) (36 states set notication thresholds at 500
or 1,000).
68
Frank Pasquale, Grand Bargains for Big Data:
The Emerging Law of Health Information, 72 M. L.
R. 682 (2013); Frank Pasquale, Redescribing Health
Privacy: The Importance of Information Policy, 14
H. J. H L. & P 95 (2014).
69 To provide the proper level of resources, the
“self-funding agency” model is useful. Certain -
nancial and medical regulators are funded in part
via fees paid by regulated entities that must apply to
24 KNIGHT FIRST AMENDMENT INSTITUTE
engage in certain activities. For example, fees paid
pursuant to the Prescription Drug User Fee Act (PD-
UFA) fund the Food and Drug Administration (which
essentially licenses drugs for sale in the U.S.). For
background on this act and its amendments, see
P D U F A,
https://www.fda.gov/industry/fda-user-fee-pro-
grams/prescription-drug-user-fee-amendments
[https://perma.cc/5THX-NTKD] (last updated Aug.
25, 2021).
25
About the Author
F P is a professor of law at Brooklyn Law School, an aliate
fellow at the Yale Information Society Project, and the Minderoo High Impact
Distinguished Fellow at the AI Now Institute. He is also the chairman of the
Subcommittee on Privacy, Condentiality, and Security of the National Com-
mittee on Vital and Health Statistics at the U.S. Department of Health and
Human Services. Pasquale is an expert on the law of articial intelligence,
algorithms, and machine learning, and author of New Laws of Robotics:
Defending Human Expertise in the Age of AI (Harvard University Press, 2020).
His widely cited book, The Black Box Society (Harvard University Press, 2015),
develops a social theory of reputation, search, and nance, and promotes
pragmatic reforms to improve the information economy, including more
vigorous enforcement of competition and consumer protection law. The Black
Box Society has been reviewed in Science and Nature, published in several
languages, and its h anniversary of publication has been marked with an
international symposium in Big Data & Society.
© 2021, Frank Pasquale.
Acknowledgments
I wish to thank David Baloche, Jameel Jaer, Margot Kaminski, Amy Kap-
czynski, Gianclaudio Malgieri, Ra Martina, Paul Ohm, Paul Schwartz, and
Ari Ezra Waldman for very helpful comments on this work. I, of course, take
responsibility for any faults in it. I also thank the Knight First Amendment
Institute and the Law and Political Economy Project for the opportunity to
be in dialogue on these critical issues.
26 KNIGHT FIRST AMENDMENT INSTITUTE
About the Knight First Amendment Institute
The Knight First Amendment Institute at Columbia University defends the
freedoms of speech and the press in the digital age through strategic litiga-
tion, research, and public education. It promotes a system of free expression
that is open and inclusive, that broadens and elevates public discourse,
and that fosters creativity, accountability, and eective self-government.
knightcolumbia.org
Design: Point Five
Illustration: ©Erik Carter