NP-Complete Breakfast

A Modest Proposal To Save the World


Warning: what follows are the hare-brained ideas of a madOr at least fairly irate. scientist. Before undertaking any dangerous, untested geo-engineering project that has the potential to destroy life as we know it, you should consult with your physician, physicist, psychic, and/or psychiatrist.

I want to specifically emphasize that there are a lot of open questions and missing details in the ideas below. This essay is not meant as a solutionAnd definitely not as a magic bullet that removes the need for other solutions! If stupid conservatives glom onto this as a way to avoid responsibility I will be quite put out. but rather as an exhortation to undertake the research that can answer these questions while there is still time for the answers to be useful.

Climate change is the biggest catastrophe in history

There are so many terrible things happening all at once, it’s hard to keep track of all of them. Originally I had a long section here laying out how bad climate change is, but you probably know all of that already.In case you managed to forget for a few seconds: the world is warming at an alarming rate. We just got the latest report on the danger, but there have been reports like that for a long time. It’s hard to imagine the emissions picture changing in the near term, and we only have about a decade if we want to avoid the worst consequences. We need a Green New Deal pronto, but even if we can start that tomorrow we should be thinking of what else we can do to reduce the damage. I’ll keep it short and assume you’re onboard with the title of this section.

The looming disaster scenarios and our collective failure to take action has led a lot of people to think about options that don’t involve reaching a global agreement on energy conservation. Most of these options involve ways of getting CO2 out of the atmosphere and sticking it somewhere safe. Unfortunately most of these options are pretty lousy. The big problem is scale. Since the industrial revolution we’ve added about 1600 gigatons of carbon dioxide into the atmospheresource: this infographic and their sources. We continue to dump more at a rate of >30Gt CO2 per yearsource: IEA.

If sequestration is going to work, it has to work at the multi-gigaton scale. I haven’t heard of any sequestration plan that can come close to capturing that amount of carbon: the typical proof of concept is minuscule, and realistic industrial application would only be a drop in the bucket compared to global emissions. In general, any energy-intensive method of sequestration runs into a green twist on the rocket equation: you’ll do a lot of work to offset all the carbon you emitted to do the sequestration itself, and if you aren’t extremely efficient this will massively increase the amount of effort needed in the end.Another big strategy is to block some of the sunlight hitting us. I consider that plan slightly more crazy and outlandish than the one I’m going to outline below. And for what it’s worth, it would likely be far more expensive.

Furthermore, most technologies are not scalable as long as humans are needed: if sequestering one ton of carbon requires one person to work for one hour, we’ll need >15 million people working full-time to offset our current emissions, with no source of funding in sight. A practical sequestration strategy needs to be “too cheap to meter”—so efficient that the cost per ton goes to zero.

Cyanobacteria: processing carbon at scale

Perhaps predictably, this is where a biologist says that biology offers a solution. While biological organisms are frustratingly messy and chaotic and difficult to understand, they’re also amazingly efficient, at a scale that no human technology can touch. I’m certainly biased, but I can’t think of a more plausible way to sequester carbon than to engineer cyanobacteriaSometimes known as “blue-green algae”, but they’re bacteria rather than proper algae. to more efficiently capture sunlight and fix carbon into sugar, where it can be introduced to the food chain and sequestered for the long term by larger organisms.

A promising candidate for this type of project is Prochlorococcus marinus, a.k.a. photosynthetic picoplankton. These are some of most abundant organisms on Earth, with an estimated \(2.8 - 3 x 10^{27}\) living in the world’s oceans. To put that number in perspective:

Humans (est.)                                7,000,000,000
Human cells                  2,100,000,000,000,000,000,000
Prochlorococcus      3,000,000,000,000,000,000,000,000,000 (!)

At \(3 x 10^{-14}\) g of carbon per cellsource: Cermak et al., the existing population of picoplankton contains roughly 81 megatons of carbon. If they divide once per day, that adds up to about 30Gt per yearThis is a very rough estimate, but it’s on the same order as real research on the role of picoplankton in the carbon cycle: see Johnson & Zinser et al.. To offset our current emissions, we’d need to roughly double that cycling population, assuming the excess population was being absorbed into the oceanic ecosystem and thereby sequesteredThis is a huge assumption and one of those “open questions” I mentioned at the beginning..

We know that this could at least theoretically work, because it’s the reason that our atmosphere isn’t full of carbon dioxide in the first place. Roughly 2.45 billion years ago, some early carbon-fixing organisms grew at such a prodigious rate that they converted our atmosphere from methane- and CO2-rich to O2-rich. This wiped out a lot of the existing (anaerobic) life on Earth, in what’s been called the Oxygen Catastrophe. Such is the power of little microbes in big numbers. Update (2018-12-12): I made a mistake which I often make, which is that I conflated gigatons of carbon with gigatons of CO2. 81 megatons of carbon represents about 300 megatons of carbon dioxide, because most of the mass in the latter comes from oxygen. This is probably canceled out by an overestimate of the average growth rate, which is how I end up near the literature estimates for total production.

What’s even more amazing about this is that they pulled it off with such a lousy enzyme playing the key part.

RuBisCO is a bad enzyme and it should feel bad

The most abundant enzyme on Earth is RuBisCO: the protein responsible for converting carbon dioxide into something life can metabolize. The reason it’s so abundant is not only because of its important role—it’s because RuBisCO is not very good at capturing carbon dioxide, and it is usually the rate-limiting step in the photosynthetic pathway.RuBisCO’s inefficiency is largely because it has a difficult time distinguishing between carbon dioxide and oxygen—if you can imagine the shape of each molecule, you might see why they’d be hard to tell apart. If the enzyme accidentally grabs an O2 molecule, it still reacts, but it creates unproductive byproducts that the organism must spend additional energy cleaning up. It’s hard for the enzyme to become better at binding CO2 without also increasing this byproduct effect. Research suggests that most plants have a version of RuBisCO that is closely adapted to the CO2 and O2 concentrations of their natural habitat (see Studer et al.), and efforts to engineer the enyzme toward higher efficiency have been mostly fruitless (pun intended).

It seems strange that such an important enzyme would be so inefficient, but it makes a little more sense in light of the evolutionary history that led to its existence. At the dawn of photosynthetic life, there was essentially no molecular oxygen in the atmosphere and oceans. Thus, there wasn’t any pressure to become particularly selective for CO2 over O2, and organisms with RuBisCO were just as good as any alternative. Billions of years later, conditions are quite different, but there’s no evolutionary path for an organism to develop a better version of the enzyme. We’ll have to engineer one, instead.

In the oceans, dissolved CO2 exists in an equilibrium with carbonic acid (H2CO3, or H+ & HCO3- [a.k.a. bicarbonate]). This is the acid in the phrase “ocean acidification”, another downside to emitting gigatons of CO2. An enzyme called carbonic anhydrase catalyzes the interconversion between carbon dioxide and bicarbonate. Animals like us use this enzyme to maintain the pH balance in our blood and tissues, but plants also use it for something else: to increase the local concentration of CO2 in their chloroplasts, so that all of their RuBisCO enzymes can operate as efficiently as possible. In contrast with RuBisCO, carbonic anhydrases are incredibly good at their jobs: they are typically diffusion-limited, meaning that the enzyme works faster than the molecules involved can get out of the way. I find it a bit amazing to think about these two enzymes, interacting with such closely related molecules at vastly different levels of efficiency.

The three-legged stool of bio-geoengineering

Enough background. Here are the three steps to engineering more efficient photosynthesis into picoplankton. Each one of these steps is a major research project that should be spread over many groups, working collaboratively but not necessarily in coordination (as no one can predict what approach will work best). Given the current state of scientific knowledge I believe all three of these steps are possible, although it’s hard to guess how long they’d take or how much work they might require. I’ll introduce them in what I believe to be increasing order of difficulty:

1. Create a genetically isolated strain of Prochlorococcus marinus

Most of the fears about genetically modified organisms are nonsense—GMO crops are safe to eat and safe to grow, although they don’t make the problems of industrial agriculture go away on their own. In the case of genetically modifying bacteria, and especially in the case where we’re trying to engineer a completely new and potentially very powerful metabolic pathway, it’s much more reasonable to be concerned. For that reason, any work along these lines should take two different strategies to maintain isolation.

The first step is to synthesize a strain of Prochlorococcus with a new genetic code, one in which the codons for several pairs of amino acids have been swapped with one anothere.g. swap all the codons for alanine with those for serine, and modify the corresponding tRNA to match. The result of this recoding would be an organism that is genetically isolated from all other life on Earth: its own genome is indecipherable to any organism using the standard code, and any new DNA it incorporates will be likewise useless for its own translation machinery. Recoding several pairs of amino acids will prevent the engineered strain from ever sharing genes with other organisms in the oceanIt’s important to do this thoroughly, because the number of these organisms is so huge that every mutation occurs almost immediately. At \(3x10^{27}\) existing organisms in the ocean, I estimate that every set of three nucleic acids in the Prochlorococcus genome is mutated in at least one organism, every minute. Genetic isolation at that scale means setting up a wall of mutations that must all happen simultaneously for the progeny to remain viable..

The second step, for use during research and development of the organism, is to use non-standard amino acids (NSAAs) to keep this strain confined to the lab until we are confident it should be released (see step 3). Using a NSAA is a good way to keep something from growing outside the lab, but in this case we need our strain to grow outside the lab, so it can’t be the long-term solution. At the scale necessary for carbon sequestration, we wouldn’t be able to keep feeding it synthetic nutrients. On the other hand, recoding the genome only keeps it genetically isolated, not physically. So both measures are necessary until the design is complete.

Those tasks might seem like complete science fiction, but I list them first because all of the pieces have already been demonstrated by existing researchers. They’ve synthesized entire genomes on a similar scale to that of Prochlorococcus. They’ve introduced synthetic metabolites to create strains that can’t grow without specific additives. Researchers have even demonstrated that recoded organisms can be genetically isolated from horizontal gene transfer, although not yet at the scale needed for this project. Creating a fully codon-swapped genome for Prochlorococcus is a tall order, but it’s not too far beyond the cutting edge of research being done today. It would take time and money to get it done, but it’s well within the realm of possibilityWhich is pretty amazing, when you realize that the idea is to create organisms with an entirely new genetic code..

2. Build a better RuBisCO

The second step is a lot more audacious, but we’re closer to having the tools than ever before. The basic goal is to change the beginning of the carbon-fixing pathway from CO2 to HCO3-, which we know can be bound with high efficiency by carbonic anhydrase. This might sound simple to youOr impossibly difficult, depending on your background., but engineering new enzymes has proven to be extremely challenging, even as our ability to engineer new proteins and simulate protein structure has gotten much better. That’s partially because enzymes are dynamic machines that go through conformational changes as they catalyze a reaction—we can design a protein that’s very stable in one conformation, but getting it to cycle between multiple different states is much trickier.

I’m no expert in this field, but I do know some expertsLargely skewed towards UCSF, because that’s where I did my graduate work.. In my estimation the front-runners in de novo protein design are those working in the Rosetta familystarted by David Baker, now at the University of Washington., using machine learning and clever search algorithms to explore protein spaceTanja Kortemme’s group has done research on stabilizing multiple conformations simultaneously as a way to design new enzymes.. There are many other promising areas of research, though, including new ideas in molecular dynamicse.g. Michael Grabe’s group has done research on efficiently sampling the structures of enzymatic processes using molecular dynamics.. Just recently Google announced the results of their AlphaFold project, which is using deep learning to predict protein structure from sequence. Their initial results look incredibly promising, although it’s not clear if their approach is useful for designing new proteins.

Beyond the computational work, there are also many groups developing high-throughput methods for synthesizing and testing proteins. Almost every computationally-designed protein needs to go through lots of optimization in a real organism before it reaches its full potential, and these methods are going to be at least as important for developing our carbon-fixing machineI know even less about this topic but I’ll mention Polly Fordyce and Jennifer Cochran, both at Stanford, as two investigators doing amazing stuff in this area..

It’s hard to guess at how long this step might take, or how much it would take in terms of resources. It might turn out to be unfeasibly complicated, as one new enzyme requires another and another, multiplying the complexity. It might turn out to be entirely impossible, but I don’t think that’s the case. But we’ll never have any idea if we don’t put the resources into finding out. The worst-case risk is that we learn something useful about enzyme design and carbon metabolism, which isn’t too bad all things considered.

Strictly in terms of money, funding researchers to do this kind of work is fairly cheap—maybe a few billion dollars over a decade. Compared to the cost of cleaning up climate-related disasters, it’s trivialThe damage from the most recent California wildfire is estimated at $7.5-10 billion, and that itself is small compared to a bad hurricane or flood.. This is a gamble that may not pay off the way we hope, but we’re getting some very good odds.

3. Establish ecological efficacy and safety

I list this step last because it’s the most important as well as the most difficult. Releasing an engineered organism into the world’s largest ecosystem is not a decision to take lightly. We should only do so if we have thoroughly explored the risks involved and weighed them against the potential benefits.

The potential benefits are pretty clear, although we’ll have to make a level-headed assessment of how likely they are. Doubling the flux of carbon into the ocean is not going to fix the climate instantly, and we probably have many years of warming to go even in the best case. But we could hope to stabilize ocean acidity, which could help ameliorate the bleaching of coral reefs and the threat to many crustaceans. Depending on the amount of flux, we might hope to not only stabilize our carbon emissions but to pull out some of the excess CO2 emitted over the past two centuries.

Properly estimating the potential benefit is work for ecologists, climatologists, and geologists—the same people who are already working to help us prepare for the next century of climate change. There is a lot of research out there estimating how carbon is processed in the ocean, how it filters through the ecosystem, and how it is eventually either sequestered or is re-released into the atmosphere. I won’t attempt to summarize said research, but all of it will be important for modeling the effects of such a large intervention in the environment.

The potential risks are more open-ended. There are ecological risks that would need to be assessed—hopefully the fact that Prochlorococcus is at the bottom of the oceanic food chain would minimize some risk, but it’s not clear. It still exists in an ecosystem that may be thrown out of balance by this new arrival (for example, Prochlorococcus coexists with Synechococcus and the relationship between the two is unclear). CO2 is not usually the limiting nutrient for plankton, so one question is whether there is even capacity in the ocean for such an expansion of the picoplankton populationBut it’s worth noting that a vastly-more-efficient carbon fixation pathway could change the calculation for limiting nutrients..

Perhaps the most obvious risk is in succeeding too well, and pulling more CO2 out of the atmosphere than we ever put in. Previously I mentioned the Great Oxygenation Event (a.k.a. the Oxygen Catastrophe), when early photosynthesizing organisms removed nearly all the methane and carbon dioxide from the atmosphere. The effect of this was a runaway ice age, the Huronian glaciation, that lasted 300 million years and caused mass extinctions. Obviously we don’t want to do thatThis documentary suggests it wouldn’t be pleasant..

There are a variety of different solutions for this that need to be explored. We could try to engineer a limited number of divisions into our new strain of picoplankton, although with such a huge number of divisions it’s very likely to mutate away from any biological switchJumping to a different domain of life: in organisms like ourselves, the ends of our chromosomes (telomeres) get shorter every time our cells divide, and continual growth requires enzymes called telomerases to lengthen them. If we encoded a telomerase to require NSAAs, the organism could be grown in the lab with synthetic nutrients but would have a ticking clock as soon as it was unable to extend its telomeres.. We could strive to engineer our carbon-fixing enzyme very carefully, such that it loses efficiency as the carbon concentration goes back to normal levels. Again, the organism will have strong incentive to mutate away from any obstacles, but it’s possible that we could engineer a local minima that was difficult to escape. In either case, we can use a version that relies on non-standard amino acids to test its ability to evade our control mechanisms—NSAAs aren’t economical at global scale but could be used to grow trillions of organisms in a controlled setting so that we can explore its mutational landscape.

No easy answers

In the end, there’s no way to prove that a geoengineering approach like this is risk-free. On the other hand, the risk of doing nothing is extreme. As a global society we will likely be faced with choosing between a known-dangerous future or a potentially-dangerous solution. I don’t know how to answer that question for anyone else, except to advocate that we start exploring the options as soon as possible, before it’s too late. And while the outline I just laid out is pretty crazy, it seems a lot saner than hoping the world is going to fix itself anytime soon. In summary:

I know I just met you,

and this is crazy,

but we’re facing the biggest ecological disaster in recorded history,

so let’s try this, maybe?

Journals! What Are They Good For?


Here’s my publishing manifesto, or something like that. Not that I have any of this stuff figured out, of course. But I feel like I need to plant a flag somewhere, so I have something to point at.

Step 1: PPPR

I’m not going over the reasons that the current scientific publishing system is broken. I’ll start by assuming that you, the reader, are basically on board (or at least pretty familiar) with these ideas:Assuming you’re some kind of scientist. If you aren’t then all of this might be pretty confusing.

  1. Our work should always be open-access.
  2. In fact, we should publish pre-prints so that our work gets out before the review process is finished.
  3. And hey, if we’re going to put our work out there as a pre-print, we might as well do the reviewing in public, too.

For brevity I’ll refer to that train of thought as the Post-Publication Peer Review (PPPR) scheme. You can make a strong argument that this scheme leads to all kinds of benefits: faster scientific progress, a more level playing field, wider dissemination of knowledge, and so on.There are also some criticisms and concerns with how viable the PPPR scheme is, but I don’t think I can do them justice so I won’t try to summarize them. You can read a more thorough discussion about this in this blog post from Michael Eisen, or honestly in almost any random blog post from Michael Eisen.

There have been some cool ideas that grow out of this, like the concept of an overlay journal. This is a journal that points to existing preprints rather than re-formatting and re-publishing them on its own servers or even (if you can imagine such a thing) on paper. An example of an overlay journal that points to arXiv prints is Discrete Analysis, but they certainly aren’t the first or the last.

Step 2: ???

I have drunk copious amounts of the PPPR Kool-Aid™—I’m convinced that we need to publish early and do so in an open manner. My main concernAnd the only thing that makes this post different from any other post about PPPR is about what PPPR doesn’t fix, and about what arises from the rubble of the current system of journals.

Getting Scooped Still Sucks

One of the main concerns people have with pre-prints is that it will cause them to be scooped—they’ll put their research out there, some other lab will read it, and that lab will race to publish the same work “first” in a refereed journal.Some of this belongs in the possibly-mythical post about academia that I’m planning to write.

This is a pretty silly concern because you can’t scoop something that’s already published (and pre-prints are published). Furthermore, malicious scooping is probably quite rare—it’s more likely that the other lab just doesn’t know what you’re working on.I’ll get to that problem in a minute.

But those answers sidestep a bigger issue, which is that scooping shouldn’t be a thing in the first place. Research projects take years, but being a month later to publication is called “being scooped”. That creates a crazy environment for doing research, and we all are worse off for it:

  • It’s ridiculously stressful, particularly for less-established scientists and trainees. Through no fault of their own, their career trajectory can be permanently altered. This kind of thing drives people out of science.
  • It discourages labs without major resources from working in competitive areas. These are often the areas where we’d benefit most from fresh ideas.
  • When there is an “obvious” advance to be made, there’s a mad rush to get there first and claim credit. This is a waste of time and resources, it creates dangerous incentives for doing bad science, and it leads to widely-covered multi-million-dollar patent disputes.

PPPR doesn’t address this issue, so far as I can see. Instead of being scooped when you see the work in print, it happens when you see the work as a preprint. If we assume that preprints are deposited at about the same point that papers get submitted for review, this means a difference of several months but rarely more. In the cases when getting scooped really hurts—after multiple years of effort—it’ll still really hurt.The focus on priority is still relevant to questions of intellectual property and patents and whatnot. There’s a whole secondary discussion to be had about those issues. They’ll probably continue to exist regardless of the system we adopt, but first-to-file may have obviated some of the problems there.

Whatever system we have, it shouldn’t punish people for finding similar results simultaneously. If anything we should celebrate such discoveries: they are wholly independent replications of the finding! We don’t get enough of that as it is.

Redundant Work is a Waste

While it’s great to get independent validation of a result, it would likely be even better if those groups knew they were working on the same topic, and instead of competing with each other to publish first, they formed a loose collaboration and worked together. This doesn’t need to be a formal endeavor—tight collaboration requires a lot of work to maintain, and when done poorly it leads to a lot of headaches. A loose collaboration is more about sharing protocols, preliminary and negative results, and ideas. This can be common within an institution but is typically rare when there is no pre-existing relationship between labs.

Often a type of collaboration can happen late in the game: two groups will learn of each other and try to publish together. That might not involve any real exchange of knowledge, and barely qualifies as collaboration in my mind.It might be better described as collusion Those groups are trying to solve the prisoner’s dilemma that comes from the threat scooping—if one publishes first they get more glory, but the arbitrary delays of publication mean they risk losing it all to the other group. Publishing together is safer. But most of the hard work was done in parallel, without any communication between labs. This is a waste of a lot of people’s time and money, and leaves a more confusing scientific literature for everyone to look at later.

Step 3: Non-profit!

Here is where I am supposed to lay out a proposal for solving those problems. Unfortunately I don’t really have one. I do have some vague ideas about how I wish research worked, and I’ll outline those ideas here.

Open Notebooks

Rather than doing our work privately and then publishing a complete story at the end, we should be transparent about the (usually messy!) process, and highlight the results we think they are notable enough to share with the community.

I’m hardly the first person to suggest this—many scientists are already doing it.Which is super impressive—it’s scary to put your work out there that way To change the culture of research, however, it can’t just be a handful of idealists: it needs to be the standard practice. Unfortunately there isn’t a whole lot of reason to do it. The potential benefits for other people are easy to imagine:

  • Access to data as soon as they are collected allows other researchers to build on results quickly.
  • Many eyes on data helps prevent errors and allows for novel interpretations, free from the bias or preferred outcome of the experimenter.
  • Public notebooks prevent p-hacking and all kinds of other shady practices.
  • It also provides an honest account of the progression of the research, instead of the Newspeak-like “we have always been working on protein X” narrative of a paper.

That last point is important: being more open about how research is done is good for public engagement and it’s good for the mental health of trainees, who get to see that their own experience is the norm. Research is hard and the research community should stop pretending otherwise.

To be clear, I realize that the paper lab notebook isn’t going away. It’s just too useful, and paper tends to last longer than bits despite our best efforts. We still need a way to share our progress as it happens.

What’s less clear is why any individual would pursue this strategy. Beyond gaining some recognition and reputation for having the chutzpah to be open, it seems to present much more risk than reward. This is where the whole ecosystem needs to change, in a major way: we need a system that allows open collaboration at any level of engagement, and can aggregate that collaboration and contribution into a CV that the researcher can point to in the future. Luckily, we have a reasonably good model for what this could look like.

The GitHub Model

Open source software provides a model, albeit imperfect, for how to build this ecosystem. GitHub in particular is the de facto clearinghouse for a developer to display their credentials.To be honest I don’t know how true this is nowadays Every contribution is recorded: from personal projects and major contributions to public resources, all the way down to opening a bug report on an obscure repository. Moreover, the quality of the work can be inspected: one can read the code that they are writing and judge its quality. One can see how they interact with other projects and whether they provide helpful feedback or pointless criticism.

The equivalent ecosystem for researchers is really quite similar: a siteOr a federation of sites! where researchers can deposit their data, analyze and discuss their results, and collaborate with others at many levels.Just as with code, there will always be “private repos” of work that is not yet ready to be released. But as the ecosystem evolves I think these will become less common. Everyone starts with a mess and refines it slowly—everyone mislabels data and makes mistakes and recalculates. These mistakes are normal and don’t need to be hidden, and starting from scratch in the open allows us to build bootstrapping materials that get projects started faster. Collaboration could range from helpful trouble-shooting to in-depth peer review—with open data and reproducible analysis pipelines, peer review can involve independent validation and reproduction of the analyses. Every researcher always has an up-to-date CV that includes their own work, their collaborations big and small, and their service in the form of peer review and any other contributions to the scientific community.

Journals: what are they good for? Journalism!

To return to the title of this post: whither the existing journals in this future utopia? It’s a common expectation that PPPR will make journals obsolete, because the ecosystem of preprints and public review will destroy the need for subscriptions.This is mentioned in Eisen’s post, for instance I don’t know if it would, but I definitely don’t think it should do such a thing. Journals are legitimately useful in many ways: they highlight notable research (as best they can); they report on news relevant to their target community; and they provide a secondary perspective, typically from an eminent third-party, for particularly interesting work. Journals also tend to publish reviews which can provide a useful overview of a given topic—typically these reviews are invited publications and thus fall outside of the typical peer-review process.

All of these services are useful for researchers, and it’s not difficult to imagine a journal that provides only those services: one that highlights notable work (in preprint form or even straight from a public notebook) rather than trying to filter out the most “impactful” work from an ocean of research. A journal that focused on science journalism would be less biased by famous names and cozy relationships.but surely still biased, just like any media can be They would be reacting to impactful research rather than decreeing what research should be impactful.

In this scenario it seems likely there would be far fewer journals—given the vast number of predatory and/or poorly-edited journals, this is probably a good thing. Journals that provide a value-add beyond “you’re published” will still be viable—certainly the likes of Science and Nature will stick around to report on matters of interest to the entire scientific community. This will include reporting on the most notable research being done, but their role will be quite different: they will be obligated to discuss the work that the community deems notable, rather than being the gatekeepers who selected it.

Behold: the QR card


I’m inordinately pleased with thisThat’s me, in case you can’t tell.

How it works

It turns out that getting an arbitrary image into a QR code is annoyingly difficult. Or at least it took me quite a while to do it—partially due to inertia, as it required the concentrated boredom of a trip home for the holidays to get me focused enough to crack it. In the end, the code is fairly simple.

In short, I had to reverse-engineer a QR code library, figure out where the data was stored, put in the data I wanted, and then calculate the corresponding error correction code so that the thing would scan properly.

In more detail

The whole thing fits in a GitHub Gist, if you don’t include the required python-qrcode library. The workflow is fairly simple:

  1. Get the layout of the QR codeThis took the longest to figure out. The simplest way was to create a blank code.
  2. Load an image and extract the data from it, using the coordinates from 1.
  3. For each of the bit masks:
    • Decode the data from the image, insert the desired URL, and correct any disallowed or undesirable bits
    • Recode the data and calculate the correct EC bits
    • Insert data and EC into the image and display it
  4. Choose which of the masks is best

More Glasseye


Seems like I’m still pretty bad at updating this thing. Partially because I’m sure no one is reading it. Something of a Catch-22 there. So here’s another website-customization post!


Tufte CSS suggests that you break articles up into sections, with a little bit of padding around them. That was a little tricky to do here because there’s no section syntax in Markdown, but since I don’t use horizontal rules, I can use that syntax to designate the beginning and ending of sections by replacing the resulting tag in my fork of Glasseye. It’s a little hacky but it’ll do. I’d prefer to use BeautifulSoup to do the processing, but I didn’t see an easy way to wrap lots of tags into a section.

Because this post is pretty smalland also because I found a typo so I was editing it anyway, I broke up the Team Science post into sections as well.

I couldn't tell you why I made this


Once I thought of it, I had to do it, and this is the least-shameful place to post it. Plus I get to test out the \(\LaTeX\) formatting here.

\[James^TWebber = \begin{bmatrix} JW & Je & Jb & Jb & Je & Jr\\ aW & ae & ab & ab & ae & ar\\ mW & me & mb & mb & me & mr\\ eW & e^2 & eb & eb & e^2 & er\\ sW & se & sb & sb & se & sr \end{bmatrix}\]

Go Team Science! (Or not)


I don’t seem to have a handle on this blogging thing, yet. I have a long and rambling draft for a post about publishing, which has been languishing for months now. Meanwhile the NIGMS just released a Request-For-Information on “Team Science”. That seems as good an excuse as any to write something here. They even provided me with a nice outline, and an introduction:

The National Institute of General Medical Sciences (NIGMS) seeks input on team-based scientific research and the manner in which this activity could be supported by the Institute. This Request for Information (RFI) will assist NIGMS in considering needs and opportunities in team science at the multi-investigator, institutional, regional and national level. Comments are invited from any interested party.

Hey, I’m an interested party, so let’s do it. The RFI is broken into six areas, and I’ll structure my post likewise. But first, a general overview of where I’m coming from (since I haven’t posted anything else on this blog and you might not know).On the off chance that the NIGMS doesn’t read this blog, I’ll also email this to them.

The Short Version

The idea of team scienceThat is, big interdisciplinary groups working on large projects sounds great, but when it gets implemented there are some challenges. Most of those challenges boil down to how we publish science, how we evaluate scientists, and how we train students and evaluate their work. All of these issues deserves their own essays, but in short:

  • A whole lot of science gets done by trainees—post-docs and graduate students
  • Early-career scientists are evaluated on the basis of their personal accomplishments. Around here that almost always means your own project and a publication with your name at the front (or at the end, for new PIs).
  • Team science requires teamwork. It’s really hard to collaborate in such a way that everyone involved can have their own publication at the end.

Because of those points, I don’t think team science is a good use of funding. The goals of such a grant are not aligned with the interests of the individual labs. Unless universities and funding agencies are going to seriously change how they evaluate scientists, big teams don’t really work in academia.

So, after laying down that thesis statement, I’ll go over the sections of the RFI.

Interest in team science

Comment on the appropriateness and usefulness of team-based research to address particular areas of biomedical science. This may include comments on the relative importance of team science in your field and your own experiences with team science.

Not just appropriate and useful, I think team-based research is essential! There are so many open questions in biology that involve the intersection of many fields. The big challenge is in integrating the collective knowledge of so many different specialties.

I don’t think that big grants given to specific groups is the best way to achieve that goal. In any funding system it’s hard to pick winners—with a smaller number of large grants, that problem is compounded. The big-grant approach heavily favors the large institutionsLike UCSF, for one. We have a lot of these grants that have faculty studying a wide range of related topics and can come together to apply for major grants.

The most important thing the NIGMS can do to support team science is to set up and administer common resources. Rather than try to fund large consortia, build scientific clearing-houses that facilitate teamwork. And revamp funding criteria to reward scientists that play supporting roles on many projects.

Management and advisory structures in team science

Comment on the types of management structures within a project that would enable an effective team science program. This may include the organizational structure, leadership models, and use of advisory boards or external review groups. Comments on challenges and solutions for issues such as the training needed for effective team science leadership, approaches for maintaining communication within and between teams, and strategies for maintaining team effectiveness are welcome.

Alluded to in the previous section: the best way to manage and advise teams of scientists is to let them form on an ad-hoc basis, without management. Fund individual PIs on the merits of their own workI think the MIRA is a great start, including their service to helping other labs. If the NIGMS provides support to make team-based science efficient and rewarding, PIs will figure out how to do it on their own.

Team composition

This may include comments on recruiting team members, the importance of training students or mentoring junior PIs involved in team science, the value of diversity in team science, and the challenges of recognizing individual efforts on team-based research within a university or research institute setting.

As I said at the top, I don’t think trainees and junior PIs can thrive as cogs in a huge machine. But in a decentralized system they can be cogs in a smaller machine: their lab. If it is possible to reach out to collaborators easily, and they have incentives for providing assistance, it should be possible for small labs to engage in team science without being lost in the crowd.

Resources and infrastructure

Comment on the resources and infrastructure that are needed to support team science, including teams consisting of groups from multiple institutions. This may include comments on technical and administrative cores, both those currently available at your institution and those that would need to be established to support team science.

This is where NIGMS can make the biggest impact, but not necessarily at the institutional level. Institutions should be able to facilitate collaboration within their own walls. What the NIGMS can facilitate is collaboration between institutions, by helping groups share results and feedback. One way to do that is to build common resources for sharing results, methods, reagents, and data. The NCBI does this for lots of computational dataalthough it could be much improved and they’re an indispensable part of biomedical research because of it. Whenever the NIGMS identifies a problem worth of targeted grants, they should consider what common resources they can provide to the community, rather than assembling a dream team.

Collaboration is hard for much the same reason that team science in general is hard: it is difficult to align the interests of two research groups so that they work together effectively. I think the major obstacles is that the minimum unit of research that is useful to a scientist is the publication, and a publication is a large investment of time and effort. This means that the activation energy for collaborating is very high.I had more to put here, but the short version is: the farther you get from first/corresponding author, the less likely it’ll be worth the effort, and so it’s a lower priority. Thus there’s a limit to the size of collaborations

If the NIGMS really wants to encourage team science in a systematic way, it needs to develop a way to track, quantify, and reward collaboration at all levels. Providing a small part of ten projects must be as valuable as spearheading one. And this contribution can’t just be recorded in author lists, which are subjective and inaccurate. We need a better way to record who helped with what on which piece of research, so that we can evaluate and reward the scientists who make an impact in many different areas: the team players.

Assessment of team science

This may include factors that should be considered in the peer review of team science-based grant applications, the value of interim reviews during the funding period, the importance of outreach activities, and the appropriate quantitative and qualitative measures of the success and impact of team science.

This bit goes out the window if you take my advice and stop providing grants to giant groups of PIs. Reward individual PIs for their work, based on its scientific merit. And don’t do it based on specific project proposals, because they never work on what they say anyway.

Comments on past or current NIGMS team-based programs and funding mechanisms

Comments are welcome on the advantages or disadvantages of programs and grant mechanisms such as the National Centers for Systems Biology (P50), Centers for Injury and Peri-Operative Research (P50), Centers for HIV/AIDS-Related Structural Biology (P50), COBRE Centers (P20), Glue Grants (U54), the Protein Structure Initiative (U54), and Program Projects (P01).

UCSF has/had two P50 grants that I’m somehow associated with: one for a Systems Center and one for a HARC Center. My impression of those grants is that they certainly funded a bunch of projects, but they weren’t the proximal cause of any team science. PIs have areas of research and trainees need their own publications. Collaborations happen when they further those goals and not otherwise.

Many projects at UCSF involve multiple labs working together, but that tends to be determined (and should be determined) by the scientific questions, not by the funding source. If the P50 grants were replaced with stable funding for many labs, I don’t think the outcome would be less collaborative, and the scientific output would likely be higher.

How is any of this different from the status quo?

I’m not sure! In the end, I’m suggesting that the NIGMS step back from these team science grants and focus on labs. I think the biggest things I propose are somewhat tangential to this RFI: fund labs rather than projects, reward small units of collaboration/supporting work, and good things happen.

But that’s why I used this RFI as inspiration for a blog post in the first place: the questions NIGMS is asking were about the topics I wanted to discuss anyway.

Coming Soon (maybe)


I made this blog because some stuff doesn’t fit in a tweet, so I should probably write something, huh? Here are the main things I want to write, eventually:

  • Something about publishing
  • Something about graduate education
  • Something about the research enterprise in general
  • (Maybe) something about statistics

If I ever write these posts, I’ll update this one to reflect that.

Glasseye Upgrades


I fixed the way that Glasseye handles sidenotes and margin notes, so that they auto-hide (with a pop-up link) when the screen is too narrow to accomodate them. Job well done.

At this point my “Glasseye” implementation is not really similar to the original code, so I’ve created a GitHub project for this site. In case you wanted to see how the magic happens.

I imagine that at some point I may even break that plugin out into yet another repository, because there are still some tweaks I want to make…we shall see.

Introducing Glasseye, sort of


I like how this looks but I wanted more functionality—particularly the sidenotes from Tufte CSS.Look! A sidenote!

And yes, this is still a blog about making a blog. I’ll get to other topics eventually.

Looking around for options, I stumbled on Glasseye, by Simon Raper, which looks really nice. He’s combining Markdown, Tufte CSS, and d3.js in one package! Those are three things I like! And it’s in Python!

After digging into it, I ended up with a fairly stripped-down version of his code, to the extent that I just stuck it straight into a Lektor plugin rather than using the Glasseye package itselfThat reminds me, I really need to put the actual code for this site up on Github…. For now I have jettisoned all of his (very nice) d3 charts, because I’m instinctively against the idea of letting the Man design my plots for me. I can make my own d3 charts, thank you very much.But they do look pretty snazzy. I just noticed that the sidenotes don’t turn into pop-ups when the screen gets small, which is how they’re supposed to work. I should fix that.Update: I did!

On Monday I visited with Lenny TeytelmanI interact with him mostly through Twitter so I might as well link to that. of, and we talked about a variety of stuff including the intersection of open-access science publishing and startups. One of the reasons I made this blog was so I could stick those ideas somewhere…maybe that’ll be the next post.

Or maybe I’ll change the font sizes and write about that. And I know I’m going crazy with the sidenotes, they’re fun.

That was easy



The default style was kind of ugly so I decided to spruce it up with Tufte CSS which is exactly what it sounds like.

A step-by-step account:

  1. First I downloaded the CSS file and fonts from their GitHub page and added them to this project.
  2. Then I changed the layout.html file to point to it.
  3. That was it! Looks pretty nice without any modifications whatsoever. I may tweak it a little bit later.

I just retroactively lied and changed the date on this post, solely so that it would show up in the right order on the blog page. Sue me.

I’m not super happy with how it looks right now anyway—I may decide to get rid of the entire page. Or I just need to write longer blog posts. Luckily no one will ever read this.