The People Machine: The Earliest Machine Learning?

Harvard Data Science Review • Issue 3.1, Winter 2021

The People Machine: The

Earliest Machine Learning?

Francine Berman

1,2,3

Jill Lepore

Manning College of Information and Computer Sciences, University of Massachusetts Amherst,

Amherst, Massachusetts, United States of America,

Berkman Klein Center for Internet and Society, Harvard University, Cambridge, Massachusetts,

United States of America,

Department of Computer Science, School of Science, Rensselaer Polytechnic Institute, Troy, New

York, United States of America,

Department of History, The Faculty of Arts and Sciences, Harvard College, Cambridge,

Massachusetts, United States of America

The MIT Press

Published on: Jan 29, 2021

DOI: https://doi.org/10.1162/99608f92.87b0ec26

License: Creative Commons Attribution 4.0 International License (CC-BY 4.0)

Harvard Data Science Review • Issue 3.1, Winter 2021 The People Machine: The Earliest Machine Learning?

ABSTRACT

In September 2020, the Harvard Data Science Initiative (HDSI) invited Jill Lepore and Fran Berman to a

special event to talk about data science and the Simulmatics Corporation, the focus of Jill’s new book If Then:

How the Simulmatics Corporation Invented the Future. Jill’s book tells the compelling story of one of the first

corporations to use the power of digital data to both understand and illuminate the world around us, as well as

to manipulate the population and skew behavioral outcomes, a kind of power hyped as “the People Machine”

by Simulmatics’ PR team. The conversants, data scientist Fran Berman and writer and historian Jill Lepore,

had met as Radcliffe Fellows in 2019 and are both passionate about the societal impacts of technology. They

were delighted to carry on their conversation, started at Radcliffe, about data, society, information, truth and

trust at the HDSI event. This piece is an edited and streamlined version of their HDSI discussion, whose live

video recording can be found below.

Keywords: data analytics, behavioral science, ethics, simulation, elections, democracy

Visit the web version of this article to view interactive content.

Fran Berman (FB): The Simulmatics Corporation, born in 1959 and dead by 1970, was a window to today, a

tech-powered world in which data is everywhere and drives virtually everything. Jill’s book, If Then: How the

Semantics Corporation Invented the Future, describes the extraordinary history of the company, one of the first

companies to automate, simulate, and predict human behavior for commercial purposes. The book explores the

company's successes and ultimate failure in the 1960s and its lessons for us now... History and tech, tech and

culture—there's a lot to explore here. Jill and I were Fellows in 2019-2020 at the Radcliffe Institute for

Advanced Study. She was finishing the book and working on other projects and my work explored the social

and environmental impacts of the Internet of Things. It was a match made in heaven, and at Radcliffe. It's

wonderful to continue our wide-ranging conversations today, courtesy of the Harvard Data Science Initiative.

I’ll start by asking Jill to talk a little bit about the book and how she came to write it.

Jill Lepore (JL): Sure. First, I want to thank the Harvard Data Science Initiative for the invitation to this event

and everyone who's come out on Friday afternoon for this conversation, but especially Fran. It is such an honor

and a real treat to be in conversation with you. It is one of the remarkable things about a place like Radcliffe—

this is exactly the kind of conversation such an institute is hoping to cultivate. I'm really grateful, and I know

the book also benefited from the conversations that you and I had over lunches before everybody had to go

home last year. Thank you for doing this.

Jill Lepore in conversation with Fran Berman

Harvard Data Science Review • Issue 3.1, Winter 2021 The People Machine: The Earliest Machine Learning?

The book tells the story of the Simulmatics Corporation. I came across the story in 2015. I just needed a

paragraph for an essay I was writing on the history of the polling industry, because it had become clear to me

that polling was being replaced by data analytics and political prediction companies, and I needed to know,

when did that happen? In a journal article, I came across a fleeting mention of the Simulmatics Corporation

and its role in the 1960 election, working for the John F. Kennedy campaign. Historians look for evidence in

the archives, so I decided to look for the archives of this company. Simulmatics, founded in 1959 and

bankrupted by 1970, was a very small company but it had a significant history and was quite self-conscious

about its own importance. I couldn't find the papers anywhere; its archives had vanished. But I did find a

wealth of material at MIT, in the papers of the head of the company’s Research Board. He had been a political

scientist at MIT. And so, I went diving through those, and I found that not only had Simulmatics pioneered the

work of election simulation in 1960, but they'd gone on to undertake a series of projects—most of which, at

some level, failed—but which all share the same ambition, which was that you could use computer technology

to predict human behavior. And then you could sell that as a product to other entities. They formed a

corporation, went public in 1961, raised a fair amount of money, and had some really interesting clients over

the course of the 1960s. Until I found Simulmatics, I hadn’t understood how we got from Cold War behavioral

science to Facebook and social media. This company is a missing link.

FB: I found that really fascinating about the book, and, truth be told, I hadn’t known about Simulmatics before

I read the book. In many ways, Simulmatics was truly a company ahead of this time: Its mass cultural model

predated Amazon. Its relationship and its partnership with the New York Times launched data journalism and

data-driven election predictions. Its work to assess and predict voter behavior in the 1960s predated Cambridge

Analytica’s work to predict and manipulate voter behavior in 2016. Its work with the Defense Department

during the Vietnam War promoted the notion of war simulation to optimize war reality. All of this, from our

perspective, was way ahead of its time. Yet Simulmatics’ technologies could not meet its ambitions. Its

leadership was myopic, its science was sloppy, and it ultimately failed to deliver. Its problems were both of its

time and of our time.

As a historian, what do you see as the lessons learned from the Simulmatics experience? Do we have greater or

fewer controls at this point with which to rein in companies than we did then?

JL: That's a really interesting question. I think we have fewer rules, and among the reasons is Simulmatics

itself. Simulmatics’ scientists were quite brilliant, and they were extremely well-intentioned. These weren’t

nefarious people; they weren't trying to destroy institutions: they were just trying to figure out how to use these

new tools. After Kennedy won in 1960, the Simulmatics Corporation claimed credit for his victory. They said

that he had done everything that they had advised him to do, and then he won, and, therefore, he won because

of their advice. They took credit, without demonstrating that they deserved credit, and this really annoyed the

Kennedy campaign and incoming administration. American newspaper editorials condemned Kennedy for

having used this tool.

Harvard Data Science Review • Issue 3.1, Winter 2021 The People Machine: The Earliest Machine Learning?

This was partly because—in another kind of resonance with our day—there was tremendous anxiety about the

future of work in the late 1950s and early 1960s, because of automation. Kennedy—because Democrats are the

party of labor—had campaigned, had made a plank in his platform, and he was promising all these jobs

retraining programs and job subsidies for people displaced by automation. So, not only was there a general fear

of computers controlling our minds—which was just a cultural anxiety in the 1950s and early 1960s—but there

was a specific fear that this guy who has run against automation was being controlled by a giant robot. I was

fascinated to see how clear-cut it was to people at the time that this was unethical. There had been some

consideration of whether or not it was in fact illegal. Today, we no longer question that sort of thing.

FB: Is it the workforce issue that's relevant here, i.e., who is doing the predicting? Why would they be any

more annoyed that the predictions came from behavioral science and machines than if they came from a really

spot-on ad agency that gave them the exact same advice?

JL: Sure, but remember the first big ad agency-driven presidential campaign took place in 1952, when the

Eisenhower campaign hired Rosser Reeves to write television spots. Eisenhower was the first presidential

candidate to appear in his own TV ads. That was extraordinarily controversial, so controversial that the

campaign of his rival Adlai Stevenson, the Democrat, dubbed Eisenhower’s campaign ‘the Cornflakes

campaign.’ Stevenson’s campaign went to the FCC (Federal Communications Commission) and said, ‘This has

got to be illegal. You cannot have a presidential candidate hiring an ad campaign to write TV spots that are like

those for toothpaste and laundry detergent. Like, that's just got to be illegal. That's got to be a violation of

either the FCC or the FEC (Federal Election Commission) in some way. This can't be allowed.’ And they didn't

get anywhere with that, but in any case, Stevenson said, ‘This is what's the problem with American politics,

this kind of crap. This will destroy the country.’ By 1960, only eight years later, there’d been a kind of tacit

acceptance of using advertising campaigns, and, of course, also the advice of pollsters, which presidents had

been using since the 1930s. But there was, nevertheless, a lot of anxiety about it and, on top of that, just a

tremendous cultural anxiety about computers.

Consider, for example, the 1957 film Desk Set with Spencer Tracy and Katharine Hepburn. Tracy plays an MIT

systems engineer and Hepburn runs a fact-checking department, and Tracy is going to be installing this giant—

it’s supposed to be like a UNIVAC—it’s called EMERAC, a giant room-sized computer in her department. It’s

a screwball comedy romance between the two of them, but the whole movie is about the anxiety of the

displacement of people by machines. If you walk that to its logical conclusion, if John F. Kennedy could hire a

company that could simulate the election and predict the outcome, then tailor his message in order to achieve

the desired outcome at some point, why even bother with voting? That's the anxiety, that you could displace the

voters themselves. I think we actually fairly need to still be asking that question.

FB: Yes! We still have exactly the same anxiety.

Harvard Data Science Review • Issue 3.1, Winter 2021 The People Machine: The Earliest Machine Learning?

JL: Right. If voters are just tools, we're just pulling the lever, but we're being marched around to do the

bidding of an algorithm. I think to many people, that's how things feel.

FB: I want to get back to that. Your comments remind me of many of the great discussions on technology and

society we have at the Berkman Klein Center for Internet and Society at Harvard. For all my techie friends and

colleagues, I wanted to ask you about the kind of tech the Simulmatics Corporation actually did. The popular

press called the data, software, and hardware that Simulmatics used ‘the People Machine.’ You write in the

book: “The machine, crammed with microscopic data about voters and issues could act as a macroscope. You

could ask it any question about the kind of move that a candidate might make, and it would be able to tell you

how voters, down to the tiniest segment of the electorate, would respond.”

Let's talk about how this actually worked. How do we characterize their computers, data, and models in today's

terms? Were the programs data-driven simulations? Did the models utilize machine learning as we would

recognize it or anything that would resemble today’s AI? How were their models vetted? Where did the data

come from?

JL: This is a really interesting question, and I spent some time with this because calling what they did ‘the

People Machine’ was the work of their PR guy, the Director of Public Relations for the company—who in fact

wrote an essay for Harper's Magazine about the company without revealing that he was the PR guy for the

company. So, a lot of it is just boosterism and flim-flam. What even is a “People Machine”? It’s just a program

written in FORTRAN, and a bunch of data. The “People Machine”—that’s just boosterism. And, to be fair, to

the scientists who worked on the project, a lot of them were deeply troubled by the way the work was

characterized by their PR. One of them was Robert Abelson, a quite distinguished Yale scholar, who had been

one of the founders of the company. After Ed Greenfield, the president of the company, appeared on CBS

Radio and took credit for Kennedy’s victory and said, ‘Our People Machine can simulate anything,’ Abelson

wrote basically a letter of resignation. He said, ‘If you do that again, I'm out. Like, I can’t be answerable to this

nonsense. We did not do this, we did not do that. And we can't do this. We can't do that. We ran this one

program, man, calm down.’

It’s useful to remember just what a small operation this really was. Simulmatics never owned its own

computers, which is sometimes hard for people to believe, right, but there just were not even a lot of computers

around. It’s 1959. There was at that point one computer at MIT—I think it was an IBM 704—for all of the

New England schools who are in a consortium to use. You could never get time on it. They didn't have time-

sharing—this is before time sharing—so you had to get a slot to do anything. The New York office used IBM

machines at the IBM Service Center in New York. IBM world headquarters was in downtown New York, but

also had a service center where you could rent time on—I think probably a 704. Most of what they were doing

was collecting and preparing data. I tried at some point in the book—I worked really hard to find a fact that

could communicate to the reader the speed of an IBM 704 against, like, an iPhone 6. It’s just laughable. I

Harvard Data Science Review • Issue 3.1, Winter 2021 The People Machine: The Earliest Machine Learning?

mean, I could have done their entire project while I spoke this sentence, but it took them years. So you might

wonder: why elections?

After the Second World War, social scientists who were interested in quantification wanted to figure out how to

do predictive work because that's what the government wanted. They wanted predictive social science in order

to wage a war of ideas against communism. They needed to think about how the ideas get into people's heads

and how to change their minds. That was what the Cold War was about: don't believe in communism, come

with us! Defect, be capitalist! So, it was all about how to figure out messages and how to send them to people.

Now, the Ford Foundation funders of behavioral science wanted to be able to predict how messages would

affect people. Well, what's the best single way to begin a study of that project? It's voting, because voting

generates its own data. You have election returns every two years. And then you could match that up with

census data. And then we also have pretty extensive public opinion data. So, these guys figured out, all right,

here's a way to test whether we can make a predictive model of human behavior. We will predict how people

will vote based on how they voted in the past and what public opinion polls tell us about them, but you don't

have information down to the named individual level. You have aggregate data about a county or people who

are registered as Republicans in a particular precinct, say.

They called it “massive data”: their original project was the biggest social science research project up to that

point. What they did was very clever. They wrote to Gallup and to Elmo Roper, the pollster. And they said,

‘Can we have all your old punch cards if you haven't thrown them away? Can we have your punch cards from

all of your previous surveys?’ And so, Gallup and Roper, said okay, and Simulmatics got all these punch cards

from public opinion surveys, beginning in 1952, and then they had all the election returns and the census data.

They compiled it all and aggregated it in such a way so that, if you asked one question like ‘Do you support

Eisenhower’s urging us to be more involved in Korea?’, and another poll said, ‘Do you think Eisenhower

should take a stronger stance against communism?’—they would somehow make those to be like one poll

about a stronger stand about communism, because the different pollsters asked somewhat different questions.

They had to work to standardize the data. They took all the voters and came up with 480 possible voter types,

like ‘New England Catholic white woman who voted for Kennedy’—that’d be a type. And then they took all

the issues on which people had been questioned and reduced them to 52 issue clusters. And then from this, they

constructed an imaginary population of 3000 possible individuals and on which they could test if you

emphasize one issue, how they would change their opinions about other issues.

FB: So your mileage kind of varied, then.

JL: Your mileage varied! There is actually a kind of hilarious—if anyone has ever seen a 1947 film called

Magic Town, starring Jimmy Stewart, in which he plays a pollster who's not making any money until he comes

across this town of 3000 people that happens to be a perfectly mathematically accurate representation of the

entire American electorate, so he just moves there. And instead of doing polls, he just walks around town and

Harvard Data Science Review • Issue 3.1, Winter 2021 The People Machine: The Earliest Machine Learning?

asks people about issues, and then he reports that in his weekly column, and he's always right because he's

found that magical 3000 people. That's what Simulmatics is trying to do, but mathematically.

FB: You might have to give us a movie list to go with the book! Let’s talk about data science. You don’t have a

lot of good things to say about data scientists and data science, both in the book and in a great article you just

published in Nature, where another one of our Radcliffe Fellows, the amazing Jo Baker, is a senior editor.

Your summary of data science is thoughtful and it's worth us discussing. Here's what you say in the book: “In

the 2020s, a flood of money into universities attempted to make the study of data science with data science

initiatives, data science programs, data science degrees, data science centers. Much academic research that fell

under the label ‘data science’ produced excellent and invaluable work across many fields of inquiry. Findings

that would not have been possible without computational discovery. And no field should be judged by its worst

practitioners.

Still the shadiest data science, like the shadiest behavioral science, grew in influence by way of self-

mystification, exaggerated claims, and all-around chicanery, including fast-changing, razzle-dazzle buzzwords

from ‘big data’ to ‘data analytics.’ Calling something ‘AI,’ ‘data science,’ and ‘predictive’ became a way to

raise huge amounts of venture capital funding. A credulous press aided in hyping those claims, and a broken

federal government failed to exercise even the least oversight.”

I remember a number of our discussions about these kinds of issues—always fun over great Vietnamese food—

and, not surprisingly, my own view is a little more charitable than yours. In my view, data science, behavioral

science, and other emerging fields thrive in the academic sector—in the wild, if you will—where exploration is

primary. But when they're employed in the private sector, they can be weaponized, and they're often

weaponized to exploit. So in some sense, isn't the kind of hyperbole and the other ills you mentioned, and I

agree with many of them, more of a characteristic of data science for hire than data science in the wild? Isn't

data science really the victim in this process rather than the perpetrator?

JL: I think that's fairly stated. And I should just say here, that when I criticize data science, I am offering a

fairly specific sort of criticism. I treasure my colleagues at the university and the college who do this work, and

I'm not picking a fight. But it tends to be the case across realms of knowledge that when a flood of money

comes into a field, it can often attract a lot of scalawags who just want the money, and I also think sometimes

some of that money is coming from the data for hire companies that want to launder their money. They want to

borrow the prestige of a university that really is interested in the pursuit of truth and making interesting

discoveries, for the sake of knowledge. They want to affiliate themselves with the university because what

they're doing feels dirty. And I think that's ethically concerning. There was a lot of that going on in the 1960s

with behavioral science in the Vietnam War. And with chemical weapons. All the student protests and anti-war

movement that happened at MIT and also at Harvard, and many other places, including Stanford, were students

saying, ‘Well, we thought our professors were studying chemistry, you know, they're out there, working on

Harvard Data Science Review • Issue 3.1, Winter 2021 The People Machine: The Earliest Machine Learning?

napalm, what, that can't be okay.’ I worry about the lessons of the Vietnam War having been forgotten. I think,

today, that higher education is really deeply implicated, especially with fossil fuels.

I also think—you can persuade me that I'm wrong about this—but sometimes when I read descriptions of what

data science is, on some campuses, it really sounds like market research. And I think if you want to have an

undergraduate concentrate major in market research, you should have an undergraduate major in market

research, which is fine, but then not call it data science. It’s also, on its face, a very weird name to give a

discipline. We could have a Department of Facts, I guess, but aren’t we all in the Department of Facts?

Research and analysis done with large bodies of data is invaluable; I’m less convinced that it fits within the

concept of “data science”; I think the name actually demeans that work. When I think about what are all the

incredibly meaningful kinds of insights that we can gain about the natural world, say, that are unavailable to us

without machine learning-driven, computational processing of data, that stuff is incredible.

FB: There’s definitely a lot to unpack, and I share some of your concerns from a really different point of view.

I think that we have to remember that data science has become an actual academic discipline. Data has always

been a tool for research, but as an academic discipline, it’s really kind of nascent. In creating data science

programs and departments, there are a lot of experiments out there, all adding to the evolution of core curricula

and research vehicles. One question that institutions are grappling with is where data science should live—in a

statistics department, a computer science department, as a multi-disciplinary program? What should the

curriculum be? Machine learning and statistics, to be sure. What about a class in ethics? What about a class in

data preservation and data stewardship? What about classes on databases or data visualization? What about

training and practice applying data science to other disciplines? It seems to me that over the last decade or two,

the data science community is really trying to figure out how data science as a discipline best fits in various

university academic environments.

Evolving the discipline focuses on data science in academia. There is also the whole issue about how data

science is or should be used in the private sector, where data science techniques and analysis currently serve as

a tremendous competitive advantage. If you're a company, it's hard to avoid collecting and using data because

that’s what your competitors are doing. A few years ago, Harvard Business Review said that data scientist was

the sexiest job of the twenty-first century. So there is also a lot of hype around it.

It’s also good to remember that data science is a tremendously valuable tool for exploration, but not the only

one. And it is critically important to think about data in context. What can it tell us and what are its limitations?

To me that argues for ‘humans in the loop’ to ensure our collection, analysis, and inferences based on data are

useful and appropriate.

So here's another question for you. Today, and perhaps in Simulmatic’s day in the 1960s, there is an over-trust

in technology. We tend to trust the results of data analysis and computational models without asking where the

data came from, whether the models are representative, and what the context is in which the results are

Harvard Data Science Review • Issue 3.1, Winter 2021 The People Machine: The Earliest Machine Learning?

meaningful. The outcomes of our analyses may have powerful consequences and we don’t ask often enough

whether there are humans in the loop. If you think about medicine, we don't just rely on the results of machines

and disease models. We always have doctors and medical professionals interpreting those results and putting

them in context, trying to figure out all the things that were not captured by the model and the data. Shouldn’t

other kinds of data-driven analyses be doing the same thing?

You see this as a historian: new technology is always racing ahead of the curve, and once we understand its

implications, societal controls promoting the public interest play catch up. We saw this during the Industrial

Revolution. We see this throughout history. What are the right ways to align technological innovation and

societal controls? What can we do today to really take the wild west of data science and civilize it to promote

society?

JL: One thing that I've tried to bring into public discourse is a way of thinking about data as a kind of

knowledge, not as the best kind of knowledge. We have a whole cult of data now where, you know, whatever

you do, if you say it's data-driven, somehow you can get money for it.

I've tried to take some time to think through why people use the term ‘data’ in that way. I gave a talk a few

years ago called “How Data Killed Facts.” The larger analytical framework that I give historically is to think

about the history of the elemental unit of knowledge over the last centuries—maybe we could begin with the

mystery: mysteries are things that God knows and we cannot know, like the mystery of conception. The

mystery of resurrection. The mystery of the afterlife. A lot of what the Reformation is about is replacing the

mystery with the fact. The fact comes from the law but comes from a commitment to the idea that humans can

actually know things if we observe them and submit our observations to rules of evidence involving

corroboration and fairness, the kinds of rules that then are put to a jury.

Then that diffuses into the whole culture. The fact becomes central to the Scientific Revolution that you could

establish facts through empirical observation and corroboration of other science, which historians call ‘the cult

of the fact.’ It moves into journalism. By the time you get to the Enlightenment and the years before the

Industrial Revolution, historians talk about the great age of quantification, the fact is really challenged by the

number. People are doing research across great distances. The best way for them to share their observations is

by counting things. Then we have the rise of statistics, and people began to count populations and demography

is born on that, that we can count votes and we can therefore consent to be governed, and the rise of the

numbers incredibly important, in the US, especially, given the nature of the census and centrality to our

political order.

It's not really until the 20th century that something like data comes to be called ‘data’ in our sense, which is ‘a

set of numbers that human beings can’t calculate, you need a machine to calculate for you.’ You could even

begin with the 1890 census, calculated by tabulating machines. But by the time you get to 1950 and the

UNIVAC is running the census, that really is the beginning of the age of data, which happens also to be a time

Harvard Data Science Review • Issue 3.1, Winter 2021 The People Machine: The Earliest Machine Learning?

when all of American culture orients itself around the idea that scientists are gods. In 1960, Time magazine's

Man of the Year was ‘men of science,’—all men of science are like gods. This marks an incredible elevation of

data as evidence that you can't understand, only machines or ‘men of science’ can understand. There's a whole

weird sexual and racial politics to that elevation, including the self-mystification of these early guys who

started Artificial Intelligence in 1956. The rise of the age of data is in a way a return to the age of mystery. The

machines are the gods and the computer scientists are the priests and the rest of us, we just have to look up and

hope that they get it right.

FB: I’ll pivot a bit and ask a question that frankly blows my mind. In an unexpected way, the founders of

Simulmatics really did predict the future! You write that Ithiel de Sola Pool, co-founder of Simulmatics, wrote

“Towards the year 2018” in 1968, 50+ years ago. He said, “By 2018, it will be cheaper to store information and

computer bank than on paper…tax returns, Social Security records, census forms, military records, perhaps

criminal records, security clearance files, school transcripts…bank statements, credit ratings, job records—

would in 2018 be stored on computers that could communicate with one another over a vast international

network.” You say ‘People living in 2018 would be able to find out anything about anyone without ever

leaving their desk. We will have the technological capability to do this. Will [we] have the legal right?’

Pool’s and your questions are relevant today. The question for all of us is what are the legal, moral, and societal

constraints that we need to create to make sure that technology is a tool for the public good, rather than the

public being a tool for the advancement of technology? Does anything give you hope today about any of this?

JL: Pool was just brilliant. And he foresaw that, but actually the takeaway of the essay is, ‘we'll have to see

what they do in 2018.’ He just kicks the can down the road, and the lesson of the 1960s—in particular Vietnam,

but certainly of this behavioral science research—is, no, you don't kick the can down the road, you don't just

invent Facebook and see if it destroys democracy. I think that if you worship people as disruptive innovators

for long enough and just throw money at them and indulge them, well, they can really screw things up. There’s

a set of ethical guidelines around research and almost any other area. We see that in the comeuppance that

biologists had after Nuremberg about the medical research done by Nazis, and the incredible shuttering of

physicists after the Manhattan Project and the bombing of Hiroshima. People say, ‘Okay. Clearly, we shouldn't

have done this, let's come up with some rules for how we proceed from here.’ I had the expectation, as I think

many people did, that 2016 would be that moment for people who deal with personal data, and it wasn't. It

hasn't been. It's worse now than it was then. We can say that's the federal government's problem, and it is the

federal government's problem, but it is also everyone's problem.

FB: All the techniques Cambridge Analytica used can be used by other organizations. They didn't own that

particular kind of approach. So yes, it is everyone’s problem.

OK, some questions from our remote audience. Here’s one: what lessons can organizers of political and social

movements learn from the mistakes of Simulmatics and be aware of when trying to mobilize people? What's

Harvard Data Science Review • Issue 3.1, Winter 2021 The People Machine: The Earliest Machine Learning?

the difference between leadership and manipulation?

JL: One of the long term legacies—and I wouldn't put this by any means on the shoulders of Simulmatics,

which is, again, a small, failed company—but of the turn toward doing micro-targeted political messaging,

which starts in 1950s—is that its critics said the lesson to be learned was ‘this will destroy the sense of a

common good.’ If you only ever talk to voters as if the way you expect them to vote is based on who they are

and what would be good for them, you have destroyed the fabric of our political institutions and our

constitutional system, which requires voters to think about what would be good for everybody. Republicanism

with a lower-case ‘r’ means that we are supposed to go to our polling place or fill out our form and put it in the

mail, with an eye toward—as a virtuous citizen in a classic sense of what ‘virtue’ is—who is going to best

represent the interests of all of the people. Not ‘who's going to lower my tax rate or do this or that for me.’ Our

entire political discourse has changed around this kind of messaging, the micro-targeting that divides voters

interest against other voters interests and even makes political coalitions within a single political party hard to

hold onto because we kind of have all swallowed that there are 480 voter types or there are 10,000 voter types.

Whatever number it is. You’re not a citizen who's asked to think more broadly about what is good for all of us.

FB: In a way, I think that all of the customization that the data allows us is really a double-edged sword. I can

be in a cohort of people who have a particularly light, seaworthy kayak, and you can advertise kayaking gear to

me. I can also be fed news that is particularly tailored to my personal beliefs. In the first case, customization is

really convenient and does no harm. In the second case, customization means my information is highly tailored

and I may not get opposing perspectives; I may become more vulnerable to manipulation about critical societal

issues. With this kind of targeting, how are we supposed to come together as a country?

JL: Pool foresaw that in another essay he wrote, also in 1968, in a special issue of a magazine with J.C.R.

Licklider. He said, here's among the things I would predict: there will be the personal newspaper, and it'll be a

problem because we won't be able to have people belong to interest groups any longer because they'll just have

their personal interests. Then you have your highly atomized, profoundly alienated polity.

FB: It’s basically Spotify for news...Here’s another question: let’s stipulate that we’re able to predict human

behavior in some number of years, whether 5, 10, or 50. What are the lessons from history? What should we be

doing about this now? What can go well and what can go badly?

JL: I don't think there are laws of human behavior the way there are laws of gravity. Scientists should study

human behavior. There are reasons to do, obviously, quantitative work in that realm. But, for me, I actually

think that literature is more meaningful and poetry is more meaningful as ways to study human behavior. I'm a

humanist! History, I should emphasize, is not a predictive social science. There are lessons to be drawn from

the study of history, but they're not predictions.

FB: In some sense, it’s a kind of economics of the market, which is often patently unfair, don’t you think?

Harvard Data Science Review • Issue 3.1, Winter 2021 The People Machine: The Earliest Machine Learning?

JL: Exactly. It's not idle to say we value some kinds of knowledge more than others when, even in a specific

monetary sense, we value them.

FB: Here’s a timely question from our audience. It's January 23

of 2021, and the President calls you from the

Oval Office and asks you to design future regulations in education. Where would you start, what would you tell

the President?

JL: I don't really have an educational agenda. I think that there are some changes we've talked about, and

you've been really involved in thinking about data science initiatives at colleges and universities.

I was at Berkeley right before the world ended in February, and its data science program has a required history

course that I think is really interesting. Some other programs I've looked into have this stapled-on, ethics-like

unit, or just kind of tacked-on history week that's taught by a computer scientist and not a historian. Now, I do

think, given the magnitude of this turning in higher education, that it would do well to be better integrated with

other parts of university. This is why I'm here having this conversation with you! I think we need to be talking

across these differences and finding the best in one another's approaches.

FB: A last question from me. Cyberspace and technology have become critical infrastructure, especially during

the pandemic. If they continue to be largely unregulated, it's hard to curb their capacity for exploitation. This is

a problem for all of us. We’re in school online. We're ordering groceries online. We're having this conversation

online. And as each one of us logged in today, including you and I, our data was collected and carefully

categorized by the services we are using. We’re basically now living in a data wild west where we can be

exploited anytime, anywhere, by anybody.

Once our participation becomes non-optional, once our technologies become critical infrastructure, I believe

we must regulate them to promote the public interest and reduce risk. I know that in 2020 we don't have a lot of

confidence in government. I know it takes practice—the General Data Protection Regulation (GDPR) in

Europe is a really good example of that. But how do we get from here to a place where technology really is

working for us, rather than us working for technology? What do you think should happen with the federal

government, with societal norms and practices, and with the other tools that we have at our disposal, to really

improve things?

JL: I think the members of this audience probably have more clear ideas about what the best steps are to

proceed there. One thing that I fear is that the very wealthy will opt out of all these technologies. Then it'll be

the poor that are fully monitored, who can only engage in transactions in this form, and the wealthy will have

ways of avoiding those things. I think that change needs to happen pretty soon before those people find a way

to opt out and have even less incentive to turn their resources toward supporting the kind of really very radical

reforms that are required. I’d say my position on most of this stuff pretty closely tracks, say, the kind of more

hard-nosed, Elizabeth Warren position on regulation.

Harvard Data Science Review • Issue 3.1, Winter 2021 The People Machine: The Earliest Machine Learning?

FB: Thanks so much Jill! What a pleasure to talk with you about all of these issues. What are you working on

next? Any more technology books in the works?

JL: Definitely not. I am working on getting through the Zoom semester. If we already didn't understand how

diminished we are by our reliance on certain kinds of technology and how helped we are by other kinds, I hope

this is a clarifying time. So I'm trying to just do the teaching and learn the lessons that can be learned from it.

Disclosure Statement

Francine Berman and Jill Lepore have no financial or non-financial disclosures to share for this interview.

(CC BY 4.0) International license, except where otherwise indicated with respect to particular material

included in the interview.