Joi Ito's Web

Joi Ito's conversation with the living web.

Albert-László Barabási PhotologAlbert-László BarabásiMon, Dec 31, 16:14 UTC

Science is built, enhanced, and developed through the open and structured sharing of knowledge. Yet some publishers charge so much for subscriptions to their academic journals that even the libraries of the world’s wealthiest universities such as Harvard are no longer able to afford the prices. Those publishers’ profit margins rival those of the most profitable companies in the world, even though research is largely underwritten by governments, and the publishers don’t pay authors and researchers or the peer reviewers who evaluate those works. How is such an absurd structure able to sustain itself—and how might we change it?

When the World Wide Web emerged in the ’90s, people began predicting a new, more robust era of scholarship based on access to knowledge for all. The internet, which started as a research network, now had an easy-to-use interface and a protocol to connect all of published knowledge, making each citation just a click away … in theory.

Instead, academic publishers started to consolidate. They solidified their grip on the rights to prestigious journals, allowing them to charge for access and exclude the majority of the world from reading research publications—all while extracting billions in dollars of subscription fees from university libraries and corporations. This meant that some publishers, such as Elsevier, the science, technology, and medicine-focused branch of the RELX Group publishing conglomerate, are able today to extract huge margins—36.7 percent in 2017 in Elsevier’s case, more profitable than Apple, Google/Alphabet, or Microsoft that same year.

And in most scholarly fields, it’s the most important journals that continue to be secured behind paywalls—a structure that doesn’t just affect the spread of information. Those journals have what we call high “impact factors,” which can skew academic hiring and promotions in a kind of self-fulfilling cycle that works like this: Typically, anyone applying for an academic job is evaluated by a committee and by other academics who write letters of evaluation. In most fields, papers published in peer-reviewed journals are a critical part of the evaluation process, and the so-called impact factor, which is based on the citations that a journal gets over time, is important. Evaluators, typically busy academics who may lack deep expertise in a candidate’s particular research topic, are prone to skim the submitted papers and rely heavily on the number of papers published and the impact factor—as a proxy for journal prestige and rigor—in their assessment of the qualifications of a candidate.

And so young researchers are forced to prioritize publication in journals with high impact factors, faulty as they are, if they want tenure or promotions. The consequence is that important work gets locked up behind paywalls and remains largely inaccessible to anyone not in a major research lab or university. This includes the taxpayers who funded the research in the first place, the developing world, and the emerging world of nonacademic researchers and startup labs.

Breaking Down the Walls

To bypass the paywalls, in 2011 Alexandra Elbakyan started Sci-Hub, a website that provides free access to millions of otherwise inaccessible academic papers. She was based in Kazakhstan, far from the courts where academic publishers can easily bring lawsuits. In the movie Paywall, Elbakyan says that Elsevier’s mission was to make “uncommon knowledge common,” and she jokes that she was just trying to help the company do that because it seemed unable to do so itself. While Elbakyan has been widely criticized for her blatant disregard for copyright, Sci-Hub has become a popular tool among academics, even at major universities, because it removes the friction of paywalls and provides links to collaborators beyond them. She was able to do what the late Aaron Swartz, my Creative Commons colleague and dear friend, envisioned but was unable to achieve in his lifetime.

But, kind of like the Berlin Wall, the academic journal paywall can crumble, and several efforts are underway to undermine it. The Open Access, or OA, movement—a worldwide effort to make scholarly research literature freely accessible online—began several decades ago. Essentially, researchers upload the unpublished version of their papers to a repository focused on subject matter or operated by an academic institution. The movement was sparked by services like arXiv.org, which Cornell University started in 1991, and became mainstream when Harvard established the first US self-archiving policy in 2008; other research universities around the world quickly followed.

Many publications have since found ways to allow open access in their journals by permitting it but charging an expensive (usually hundreds or thousands of dollars per article) “article processing charge,” or APC, that is paid by the institution or the author behind the research as a sort of cost of being published. OA publishers such as the Public Library of Science, or PLOS, charge APCs to make papers available without a paywall, and many traditional commercial publishers also allow authors to pay an APC so that their papers appearing in what is technically a paywalled journal can be available publicly.

When I was CEO of Creative Commons a decade ago, at a time when OA was beginning in earnest, one of my first talks was to a group of academic publishers. I remember trying to describe our proposal to allow authors to have a way to mark their works with the rights they wished to grant to their work, including the use of their work without charge but with attribution. The first comment from the audience came from an academic publisher who declared my comments “disgusting.”

We’ve come a long way since then. Even RELX now allows open access for some of its journals and uses Creative Commons licenses to mark works that are freely available.

Many publishers I’ve talked to are preparing to make open access to research papers a reality. In fact, most journals already allow some open access through the expensive article processing charges I mentioned earlier.

So in some ways, it feels like “we won.” But has the OA movement truly reached its potential to transform research communication? I don't think so, especially if paid open access just continues to enrich a small number of commercial journal publishers. We have also seen the emergence of predatory OA journals with no peer review or other quality control measures, and that, too, has undermined the OA movement.

We can pressure publishers to lower APC charges, but if they have control of the platforms and key journals, they will continue to extract high fees even in an OA world. So far, they have successfully prevented collective bargaining through confidentiality agreements and other legal means.

Another Potential Solution

The MIT Press, led by Amy Brand, and the Media Lab recently launched a collaboration called The Knowledge Futures Group. (I am director of the Media Lab and a board member at the press.) Our aim is to create a new open knowledge ecosystem. The goal is to develop and deploy infrastructure to allow free, rigorous, and open sharing of knowledge and to start a movement toward greater institutional and public ownership of that infrastructure, reclaiming territory ceded to publishers and commercial technology providers.

(In some ways, the solution might be similar to what blogging was to online publishing. Blogs were simple scripts, free and open source software, and a bunch of open standards that interoperate between services. They allowed us to create simple and very low cost informal publishing platforms that did what you used to have to buy multimillion-dollar Content Management Systems for. Blogs led the way for user generated content and eventually social media.)

While academic publishing is more complex, a refactoring and an overhaul of the software, protocols, processes, and business underlying such publishing could revolutionize it financially as well as structurally.

We are developing a new open source and modern publishing platform called PubPub and a global, distributed method of understanding public knowledge called Underlay. We have established a lab to develop, test, and deploy other technologies, systems, and processes that will help researchers and their institutions. They would have access to an ecosystem of open source tools and an open and transparent network to publish, understand, and evaluate scholarly work. We imagine developing new measures of impact and novelty with more transparent peer review; publishing peer reviews; and using machine learning to help identify novel ideas and people and mitigate systemic biases, among other things. It is imperative that we establish an open innovation ecosystem as an alternative to the control that a handful of commercial entities maintain over not only the markets for research information but also over academic reputation systems and research technologies more generally.

One of the main pillars of academic reputation is authorship, which has become increasingly problematic as science has become more collaborative. Who gets credit for research and discovery can have a huge impact on researchers and institutions. But the order of author names on a journal article has no standardized meaning. It is often determined more by seniority and academic culture than by actual effort or expertise. As a result, credit is often not given where credit is due. With electronic publishing, we can move beyond a “flat” list of author names, in the same way that film credits specify the contributions of those involved, but we have continued to allow the constraints of print guide our practices. We can also experiment with and improve peer review to provide better incentives, processes, and fairness.

It’s essential for universities, and core to their mission, to assert greater control over systems for knowledge representation, dissemination, and preservation. What constitutes knowledge, the use of knowledge, and the funding of knowledge is the future of our planet, and it must be protected from twisted market incentives and other corrupting forces. The transformation will require a movement involving a global network of collaborators, and we hope to contribute to catalyzing it.

When a massive earthquake and tsunami hit the eastern coast of Japan on March 11, 2011, the Fukushima Daiichi Nuclear Power Plant failed, leaking radioactive material into the atmosphere and water. People around the country as well as others with family and friends in Japan were, understandably, concerned about radiation levels—but there was no easy way for them to get that information. I was part of a small group of volunteers who came together to start a nonprofit organization, Safecast, to design, build, and deploy Geiger counters and a website that would eventually make more than 100 million measurements of radiation levels available to the public.

We started in Japan, of course, but eventually people around the world joined the movement, creating an open global data set. The key to success was the mobile, easy to operate, high-quality but lower-cost kit that the Safecast team developed, which people could buy and build to collect data that they might then share on the Safecast website.

While Chernobyl and Three Mile Island spawned monitoring systems and activist NGOs as well, this was the first time that a global community of experts formed to create a baseline of radiation measurements, so that everyone could monitor radiation levels around the world and measure fluctuations caused by any radiation event. (Different regions have very different baseline radiation levels, and people need to know what those are if they are to understand if anything has changed.)

More recently Safecast, which is a not-for-profit organization, has begun to apply this model to air quality in general. The 2017 and 2018 fires in California were the air quality equivalent of the Daiichi nuclear disaster, and Twitter was full of conversations about N95 masks and how they were interfering with Face ID. People excitedly shared posts about air quality; I even saw Apple Watches displaying air quality figures. My hope is that this surge of interest in air quality among Silicon Valley elites will help advance a field, namely the monitoring of air quality, that has been steadily developing but has not yet been as successful as Safecast was with radiation measurements. I believe this lag stems in part from the fact that Silicon Valley believes so much in entrepreneurs, people there try to solve every problem with a startup. But that’s not always the right approach.

Hopefully, interest in data about air quality and the difficulty in getting a comprehensive view will drive more people to consider an open data and approach over proprietary ones. Right now, big companies and governments are the largest users of data that we’ve handed to them—mostly for free—to lock up in their vaults. Pharmaceutical firms, for instance, use the data to develop drugs that save lives, but they could save more lives if their data were shared. We need to start using data for more than commercial exploitation, deploying it to understand the long-term effects of policy, and create transparency around those in power—not of private citizens. We need to flip the model from short-term commercial use to long-term societal benefit.

The first portable air sensors were the canaries that miners used to monitor for poison gases in coal mines. Portable air sensors that consumers could easily use were developed in the early 2000s, and since then the technology for measuring air quality has changed so rapidly that data collected just a few years ago is often now considered obsolete. Nor is “air quality” or the Air Quality Index standardized, so levels get defined differently by different groups and governments, with little coordination or transparency.

Yet right now, the majority of players are commercial entities that keep their data locked up, a business strategy reminiscent of software before we “discovered” the importance of making it free and open source. These companies are not coordinating or contributing data to the commons and are diverting important attention and financial resources away from nonprofit efforts to create standards and open data, so we can conduct research and give the public real baseline measurements. It’s as if everyone is building and buying thermometers that measure temperatures in Celsius, Fahrenheit, Delisle, Newton, Rankine, Réaumur, and Rømer, or even making up their own bespoke measurement systems without discussing or sharing conversion rates. While it is likely to benefit the businesses to standardize, companies that are competing have a difficult time coordinating on their own and try to use proprietary nonstandard improvements as a business advantage.

To attempt to standardize the measurement of small particulates in the air, a number of organizations have created the Air Sensor Workgroup. The ASW is working to build an Air Quality Data Commons to encourage sharing of data with standardized measurements, but there is little participation from the for-profit startups making the sensors that suddenly became much more popular in the aftermath of the fires in California.

Although various groups are making efforts to reach consensus on the science and process of measuring air quality, they are confounded by these startups that believe (or their investors believe) their business depends on big data that is owned and protected. Startups don’t naturally collaborate, share, or conduct open research, and I haven’t seen any air quality startups with a mechanism for making data collected available if the business is shut down.

Air quality startups may seem like a niche issue. But the issue of sharing pools of data applies to many very important industries. I see, for instance, a related challenge in data from clinical trials.

The lack of central repositories of data from past clinical trials has made it difficult, if not impossible, for researchers to look back at the science that has already been performed. The federal government spends billions of dollars on research, and while some projects like the Cancer Moonshot mandate data openness, most government funding doesn’t require it. Biopharmaceutical firms submit trial data evidence to the FDA—but not to researchers or the general public as a rule, in much the same way that most makers of air quality detection gadgets don’t share their data. Clinical trial data and medical research funded by government thus may sit hidden behind corporate doors at big companies. Preventing the use of such data impedes discovery of new drugs through novel techniques and makes it impossible for benefits and results to accrue to other trials.

Open data will be key to modernizing the clinical trial process and integrating AI and other advanced techniques used for analyses, which would greatly improve health care in general. I discuss some these considerations in my PhD thesis in more detail.

Some clinical trials have already begun requiring the sharing of individual patient data for clinical analyses within six months of a trial’s end. And there are several initiatives sharing data in a noncompetitive manner, which lets researchers create promising ecosystems and data “lakes” that could lead to new insights and better therapies.

Overwhelming public outcry can also help spur the embrace of open data. Before the 2011 earthquake in Japan, only the government there and large corporations held radiation measurements, and those were not granular. People only began caring about radiation measurements when the Fukushima Daiichi site started spewing radioactive material, and the organizations that held that data were reticent to release it because they wanted to avoid causing panic. However, the public demanded the data, and that drove the activism that fueled the success of Safecast. (Free and open source software also started with hobbyists and academics. Initially there was a great deal of fighting between advocacy groups and corporations, but eventually the business models clicked and free and open source software became mainstream.)

We have a choice about which sensors we buy. Before going out and buying a new fancy sensor or backing that viral Kickstarter campaign, make sure the organization behind it makes a credible case about the scholarship underpinning its technology; explains its data standards; and most importantly, pledges to share its data using a Creative Commons CC0 dedication. For privacy-sensitive data sets that can’t be fully open, like those at Ancestry.com and 23andme, advances in cryptography such as multiparty computation and zero knowledge proofs would allow researchers to learn from data sets without the release of sensitive details.

We have the opportunity and the imperative to reframe the debate on who should own and control our data. Big Data's narrative sells the idea that those owning the data control the market, and it is playing out in a tragedy of the commons, confounding the use of information for society and science.


When the Boston public school system announced new start times last December, some parents found the schedules unacceptable and pushed back. The algorithm used to set these times had been designed by MIT researchers, and about a week later, Kade Crockford, director of the Technology for Liberty Program at the ACLU of Massachusetts, emailed asking me to cosign an op-ed that would call on policymakers to be more thoughtful and democratic when they consider using algorithms to change policies that affect the lives of residents. Kade, who is also a Director's Fellow at the Media Lab and a colleague of mine, is always paying attention to the key issues in digital liberties and is great at flagging things that I should pay attention to. (At the time, I had no contact with the MIT researchers who designed the algorithm.)

I made a few edits to her draft, and we shipped it off to the Boston Globe, which ran it on December 22, 2017, under the headline "Don’t blame the algorithm for doing what Boston school officials asked." In the op-ed, we piled on in criticizing the changes but argued that people shouldn't criticize the algorithm, but rather the city’s political process that prescribed the way in which the various concerns and interests would be optimized. That day, the Boston Public Schools decided not to implement the changes. Kade and I high-fived and called it a day.

The protesting families, Kade and I did what we thought was fair and just given the information that we had at the time. A month later, a more nuanced picture emerged, one that I think offers insights into how technology can and should provide a platform for interacting with policy—and how policy can reflect a diverse set of inputs generated by the people it affects. In what feels like a particularly dark period for democracy and during a time of increasingly out-of-control deployment of technology into society, I feel a lesson like this one has given me greater understanding of how we might more appropriately introduce algorithms into society. Perhaps it even gives us a picture of what a Democracy 2.0 might look like.

A few months later, having read the op-ed in the Boston Globe, Arthur Delarue and Sébastien Martin, PhD students in the MIT Operations Research Center and members of the team that built Boston’s bus algorithm, asked to meet me. In very polite email, they told me that I didn’t have the whole story.

Kade and I met later that month with Arthur, Sebastien, and their adviser, MIT professor Dimitris Bertsimas. One of the first things they showed us was a photo of the parents who had protested against the schedules devised by the algorithm. Nearly all of them were white. The majority of families in the Boston school system are not white. White families represent only about 15 percent the public school population in the city. Clearly something was off.

The MIT researchers had been working with the Boston Public Schools on adjusting bell times, including the development of the algorithm that the school system used to understand and quantify the policy trade-offs of different bell times and, in particular, their impact on school bus schedules. The main goal was to reduce costs and generate optimal schedules.

The MIT team described how the award-winning original algorithm, which focused on scheduling and routing, had started as a cost-calculation algorithm for the Boston Public Schools Transportation Challenge. Boston Public Schools had been trying to change start times for decades but had been confounded by the optimizations and a way to improve the school schedule without tripling the costs, which is why it organized Transportation Challenge to begin with. The MIT team was the first to figure out a way to balance all of these factors and produce a solution. Until then, calculating the cost of the complex bus system had been such a difficult problem that it presented an impediment to even considering bell time changes.

After the Transportation Challenge, the team continued to work with the city, and over the previous year they had participated in a community engagement process and had worked with the Boston school system to build on top of the original algorithm, adding new features that were included to produce a plan for new school start times. They factored in equity—existing start times were unfair, mostly to lower-income families—as well as recent research on teenage sleep that showed starting school early in the day may have negative health and economic consequences for high school students. They also tried to prioritize special education programs and prevent young children from leaving school too late. They wanted to do all this without increasing the budget, and even reducing it.

From surveys, the school system and the researchers knew that some families in every school would be unhappy with any change. They could have added additional constraints on the algorithm to limit some of outlier situations, such as ending the school day at some schools at 1:30 pm, which was particularly exasperating for some parents. The solution that they were proposing significantly increased the number of high school students starting school after 8 am and significantly decreased the number of elementary school students dismissed after 4 pm so they wouldn’t have to go home after dark. Overall it was much better for the majority of people. Although they were aware that some parents wouldn’t be happy, they weren't prepared for the scale of response from angry parents who ended up with start times and bus schedules that they didn't like.

Optimizing the algorithm for greater “equity" also meant many of the planned changes were "biased" against families with privilege. My view is that the fact that an algorithm was making decisions also upset people. And the families who were happy with the new schedule probably didn’t pay as much attention. The families who were upset marched on City Hall in an effort to overturn the planned changes. The ACLU and I supported the activist parents at the time and called "foul" on the school system and the city. Eventually, the mayor and the city caved to the pressure and killed off years of work and what could have been the first real positive change in busing in Boston in decades.

While I'm not sure privileged families would give up their good start times to help poor families voluntarily, I think that if people had understood what the algorithm was optimizing for—sleep health of high school kids, getting elementary school kids home before dark, supporting kids with special needs, lowering costs, and increasing equity overall—they would agree that the new schedule was, on the whole, better than the previous one. But when something becomes personal very suddenly, people to feel strongly and protest.

It reminds me a bit of a study, conducted by the Scalable Cooperation Group at the Media Lab based on earlier work by Joshua Greene, which showed people would support the sacrifice by a self-driving car of its passenger if it would save the lives of a large number of pedestrians, but that they personally would never buy a passenger-sacrificing self-driving car.

Technology is amplifying complexity and our ability to change society, altering the dynamics and difficulty of consensus and governance. But the idea of weighing trade-offs isn't new, of course. It's a fundamental feature of a functioning democracy.

While the researchers working on the algorithm and the plan surveyed and met with parents and school leadership, the parents were not aware of all of the factors that went into the final optimization of the algorithm. The trade-offs required to improve the overall system were not clear, and the potential gains sounded vague compared to the very specific and personal impact of the changes that affected them. And by the time the message hit the nightly news, most of the details and the big picture were lost in the noise.

A challenge in the case of the Boston Public Schools bus route changes was the somewhat black-box nature of the algorithm. The Center for Deliberative Democracy has used a process it calls deliberative polling, which brings together a statistically representative group of residents in a community to debate and deliberate policy goals over several days in hopes of reaching a consensus about how a policy should be shaped. If residents of Boston could have more easily understood the priorities being set for the algorithm, and hashed them out, they likely would have better understood how the results of their deliberations were converted into policy.

After our meeting with the team that invented the algorithm, for instance, Kade Crockford introduced them to David Scharfenberg, a reporter at the Boston Globe who wrote an article about them that included a very well done simulation allowing readers to play with the algorithm and see how changing cost, parent preferences, and student health interact as trade-offs—a tool that would have been extremely useful in explaining the algorithm from the start.

The lessons learned from Boston’s effort to use technology to improve its bus routing system and start times provides a valuable lesson in understanding how to ensure that such tools aren’t used to reinforce and increase biased and unfair policies. They can absolutely make systems more equitable and fair, but they won’t succeed without our help.

The Next Great (Digital) Extinction »

How today's internet is rapidly and indifferently killing off many systems while allowing new types of organizations to emerge.

The Educational Tyranny of the Neurotypicals »

The current school system is too rigid, and it’s designed for a different world anyway.

Why Westerners Fear Robots and the Japanese Do Not »

The hierarchies of Judeo-Christian religions mean that those cultures tend to fear their overlords. Beliefs like Shinto and Buddhism are more conducive to have faith in peaceful coexistence.

Blog DOI enabled »

As part of my work in developing the Knowledge Futures Group collaboration with the MIT Press, I'm doing a deep dive into trying to understand the world of academic publishing. One of the interesting things that I discovered as I navigated the different protocols and platforms was the Digital Object Identifier (DOI). There is a foundation that manages DOIs and coordinates a federation of registration agencies. DOIs are used for many things, but the general idea is to create a persistent identifier for some digital object like a dataset or a publication and manage it at a meta-level to the...

Fake Meat, Served Six Ways »

Cellular agriculture has the potential to protect animal welfare and curb global warming; Joi Ito, a former vegan, grapples with the future of meat.

Ding! Earned First Higher Degree. »

In 2011, when we announced that I would join the Media Lab as the new Director, many people thought it was an unusual choice partially because I had never earned a higher degree - not even an undergraduate degree. I had dropped out of Tufts as well as the University of Chicago and had spent most of my life doing all sorts of weird jobs and building and running companies and nonprofits. I think it took quite a bit of courage on the part of the Media Lab and MIT to hire a Director with no college degree, but once...

The Responsibility of Immortality: Welcome to the New Transhumanism »

What started as a dreamy movement of acid-tripping tie-dye wearers has become a mainstream lifestyle bet in Silicon Valley—and we must be responsible about how we wield this new reality.

AI Isn’t a Crystal Ball, But It Might Be a Mirror »

Using algorithms to predict crimes has created a biased system: Better to use AI for looking inward.

Citing Blogs »

On May 13, 2018, I innocently asked: I may sound a bit naive, but as I read more academic papers in fields that I work in, I realize that they tend to cite academic papers more than blog posts even if there are better blog posts than the cited papers. It makes sense, but just noticing more specifically first hand.— Joi Ito (@Joi) May 13, 2018240 replies later, it is clear that blogs don't make it into the academic journalsphere and people cited two main reasons, the lack of longevity of links and the lack of peer review. I would...