NP-Complete Breakfast

Journals! What Are They Good For?


Here's my publishing manifesto, or something like that. Not that I have any of this stuff figured out, of course. But I feel like I need to plant a flag somewhere, so I have something to point at.

Step 1: PPPR

I'm not going over the reasons that the current scientific publishing system is broken. I'll start by assuming that you, the reader, are basically on board (or at least pretty familiar) with these ideas:Assuming you're some kind of scientist. If you aren't then all of this might be pretty confusing.

  1. Our work should always be open-access.
  2. In fact, we should publish pre-prints so that our work gets out before the review process is finished.
  3. And hey, if we're going to put our work out there as a pre-print, we might as well do the reviewing in public, too.

For brevity I'll refer to that train of thought as the Post-Publication Peer Review (PPPR) scheme. You can make a strong argument that this scheme leads to all kinds of benefits: faster scientific progress, a more level playing field, wider dissemination of knowledge, and so on.There are also some criticisms and concerns with how viable the PPPR scheme is, but I don't think I can do them justice so I won't try to summarize them. You can read a more thorough discussion about this in this blog post from Michael Eisen, or honestly in almost any random blog post from Michael Eisen.

There have been some cool ideas that grow out of this, like the concept of an overlay journal. This is a journal that points to existing preprints rather than re-formatting and re-publishing them on its own servers or even (if you can imagine such a thing) on paper. An example of an overlay journal that points to arXiv prints is Discrete Analysis, but they certainly aren't the first or the last.

Step 2: ???

I have drunk copious amounts of the PPPR Kool-Aid™—I'm convinced that we need to publish early and do so in an open manner. My main concernAnd the only thing that makes this post different from any other post about PPPR is about what PPPR doesn't fix, and about what arises from the rubble of the current system of journals.

Getting Scooped Still Sucks

One of the main concerns people have with pre-prints is that it will cause them to be scooped—they'll put their research out there, some other lab will read it, and that lab will race to publish the same work "first" in a refereed journal.Some of this belongs in the possibly-mythical post about academia that I'm planning to write.

This is a pretty silly concern because you can't scoop something that's already published (and pre-prints are published). Furthermore, malicious scooping is probably quite rare—it's more likely that the other lab just doesn't know what you're working on.I'll get to that problem in a minute.

But those answers sidestep a bigger issue, which is that scooping shouldn't be a thing in the first place. Research projects take years, but being a month later to publication is called "being scooped". That creates a crazy environment for doing research, and we all are worse off for it:

  • It's ridiculously stressful, particularly for less-established scientists and trainees. Through no fault of their own, their career trajectory can be permanently altered. This kind of thing drives people out of science.
  • It discourages labs without major resources from working in competitive areas. These are often the areas where we'd benefit most from fresh ideas.
  • When there is an "obvious" advance to be made, there's a mad rush to get there first and claim credit. This is a waste of time and resources, it creates dangerous incentives for doing bad science, and it leads to widely-covered multi-million-dollar patent disputes.

PPPR doesn't address this issue, so far as I can see. Instead of being scooped when you see the work in print, it happens when you see the work as a preprint. If we assume that preprints are deposited at about the same point that papers get submitted for review, this means a difference of several months but rarely more. In the cases when getting scooped really hurts—after multiple years of effort—it'll still really hurt.The focus on priority is still relevant to questions of intellectual property and patents and whatnot. There's a whole secondary discussion to be had about those issues. They'll probably continue to exist regardless of the system we adopt, but first-to-file may have obviated some of the problems there.

Whatever system we have, it shouldn't punish people for finding similar results simultaneously. If anything we should celebrate such discoveries: they are wholly independent replications of the finding! We don't get enough of that as it is.

Redundant Work is a Waste

While it's great to get independent validation of a result, it would likely be even better if those groups knew they were working on the same topic, and instead of competing with each other to publish first, they formed a loose collaboration and worked together. This doesn't need to be a formal endeavor—tight collaboration requires a lot of work to maintain, and when done poorly it leads to a lot of headaches. A loose collaboration is more about sharing protocols, preliminary and negative results, and ideas. This can be common within an institution but is typically rare when there is no pre-existing relationship between labs.

Often a type of collaboration can happen late in the game: two groups will learn of each other and try to publish together. That might not involve any real exchange of knowledge, and barely qualifies as collaboration in my mind.It might be better described as collusion Those groups are trying to solve the prisoner's dilemma that comes from the threat scooping—if one publishes first they get more glory, but the arbitrary delays of publication mean they risk losing it all to the other group. Publishing together is safer. But most of the hard work was done in parallel, without any communication between labs. This is a waste of a lot of people's time and money, and leaves a more confusing scientific literature for everyone to look at later.

Step 3: Non-profit!

Here is where I am supposed to lay out a proposal for solving those problems. Unfortunately I don't really have one. I do have some vague ideas about how I wish research worked, and I'll outline those ideas here.

Open Notebooks

Rather than doing our work privately and then publishing a complete story at the end, we should be transparent about the (usually messy!) process, and highlight the results we think they are notable enough to share with the community.

I'm hardly the first person to suggest this—many scientists are already doing it.Which is super impressive—it's scary to put your work out there that way To change the culture of research, however, it can't just be a handful of idealists: it needs to be the standard practice. Unfortunately there isn't a whole lot of reason to do it. The potential benefits for other people are easy to imagine:

  • Access to data as soon as they are collected allows other researchers to build on results quickly.
  • Many eyes on data helps prevent errors and allows for novel interpretations, free from the bias or preferred outcome of the experimenter.
  • Public notebooks prevent p-hacking and all kinds of other shady practices.
  • It also provides an honest account of the progression of the research, instead of the Newspeak-like "we have always been working on protein X" narrative of a paper.

That last point is important: being more open about how research is done is good for public engagement and it's good for the mental health of trainees, who get to see that their own experience is the norm. Research is hard and the research community should stop pretending otherwise.

To be clear, I realize that the paper lab notebook isn't going away. It's just too useful, and paper tends to last longer than bits despite our best efforts. We still need a way to share our progress as it happens.

What's less clear is why any individual would pursue this strategy. Beyond gaining some recognition and reputation for having the chutzpah to be open, it seems to present much more risk than reward. This is where the whole ecosystem needs to change, in a major way: we need a system that allows open collaboration at any level of engagement, and can aggregate that collaboration and contribution into a CV that the researcher can point to in the future. Luckily, we have a reasonably good model for what this could look like.

The GitHub Model

Open source software provides a model, albeit imperfect, for how to build this ecosystem. GitHub in particular is the de facto clearinghouse for a developer to display their credentials.To be honest I don't know how true this is nowadays Every contribution is recorded: from personal projects and major contributions to public resources, all the way down to opening a bug report on an obscure repository. Moreover, the quality of the work can be inspected: one can read the code that they are writing and judge its quality. One can see how they interact with other projects and whether they provide helpful feedback or pointless criticism.

The equivalent ecosystem for researchers is really quite similar: a siteOr a federation of sites! where researchers can deposit their data, analyze and discuss their results, and collaborate with others at many levels.Just as with code, there will always be "private repos" of work that is not yet ready to be released. But as the ecosystem evolves I think these will become less common. Everyone starts with a mess and refines it slowly—everyone mislabels data and makes mistakes and recalculates. These mistakes are normal and don't need to be hidden, and starting from scratch in the open allows us to build bootstrapping materials that get projects started faster. Collaboration could range from helpful trouble-shooting to in-depth peer review—with open data and reproducible analysis pipelines, peer review can involve independent validation and reproduction of the analyses. Every researcher always has an up-to-date CV that includes their own work, their collaborations big and small, and their service in the form of peer review and any other contributions to the scientific community.

Journals: what are they good for? Journalism!

To return to the title of this post: whither the existing journals in this future utopia? It's a common expectation that PPPR will make journals obsolete, because the ecosystem of preprints and public review will destroy the need for subscriptions.This is mentioned in Eisen's post, for instance I don't know if it would, but I definitely don't think it should do such a thing. Journals are legitimately useful in many ways: they highlight notable research (as best they can); they report on news relevant to their target community; and they provide a secondary perspective, typically from an eminent third-party, for particularly interesting work. Journals also tend to publish reviews which can provide a useful overview of a given topic—typically these reviews are invited publications and thus fall outside of the typical peer-review process.

All of these services are useful for researchers, and it's not difficult to imagine a journal that provides only those services: one that highlights notable work (in preprint form or even straight from a public notebook) rather than trying to filter out the most "impactful" work from an ocean of research. A journal that focused on science journalism would be less biased by famous names and cozy relationships.but surely still biased, just like any media can be They would be reacting to impactful research rather than decreeing what research should be impactful.

In this scenario it seems likely there would be far fewer journals—given the vast number of predatory and/or poorly-edited journals, this is probably a good thing. Journals that provide a value-add beyond "you're published" will still be viable—certainly the likes of Science and Nature will stick around to report on matters of interest to the entire scientific community. This will include reporting on the most notable research being done, but their role will be quite different: they will be obligated to discuss the work that the community deems notable, rather than being the gatekeepers who selected it.

Behold: the QR card


I'm inordinately pleased with thisThat's me, in case you can't tell.

How it works

It turns out that getting an arbitrary image into a QR code is annoyingly difficult. Or at least it took me quite a while to do it—partially due to inertia, as it required the concentrated boredom of a trip home for the holidays to get me focused enough to crack it. In the end, the code is fairly simple.

In short, I had to reverse-engineer a QR code library, figure out where the data was stored, put in the data I wanted, and then calculate the corresponding error correction code so that the thing would scan properly.

In more detail

The whole thing fits in a GitHub Gist, if you don't include the required python-qrcode library. The workflow is fairly simple:

  1. Get the layout of the QR codeThis took the longest to figure out. The simplest way was to create a blank code.
  2. Load an image and extract the data from it, using the coordinates from 1.
  3. For each of the bit masks:
    • Decode the data from the image, insert the desired URL, and correct any disallowed or undesirable bits
    • Recode the data and calculate the correct EC bits
    • Insert data and EC into the image and display it
  4. Choose which of the masks is best

More Glasseye


Seems like I'm still pretty bad at updating this thing. Partially because I'm sure no one is reading it. Something of a Catch-22 there. So here's another website-customization post!


Tufte CSS suggests that you break articles up into sections, with a little bit of padding around them. That was a little tricky to do here because there's no section syntax in Markdown, but since I don't use horizontal rules, I can use that syntax to designate the beginning and ending of sections by replacing the resulting tag in my fork of Glasseye. It's a little hacky but it'll do. I'd prefer to use BeautifulSoup to do the processing, but I didn't see an easy way to wrap lots of tags into a section.

Because this post is pretty smalland also because I found a typo so I was editing it anyway, I broke up the Team Science post into sections as well.

I couldn't tell you why I made this


Once I thought of it, I had to do it, and this is the least-shameful place to post it. Plus I get to test out the \(\LaTeX\) formatting here.

\[James^TWebber = \begin{bmatrix} JW & Je & Jb & Jb & Je & Jr\\ aW & ae & ab & ab & ae & ar\\ mW & me & mb & mb & me & mr\\ eW & e^2 & eb & eb & e^2 & er\\ sW & se & sb & sb & se & sr \end{bmatrix}\]

Go Team Science! (Or not)


I don't seem to have a handle on this blogging thing, yet. I have a long and rambling draft for a post about publishing, which has been languishing for months now. Meanwhile the NIGMS just released a Request-For-Information on "Team Science". That seems as good an excuse as any to write something here. They even provided me with a nice outline, and an introduction:

The National Institute of General Medical Sciences (NIGMS) seeks input on team-based scientific research and the manner in which this activity could be supported by the Institute. This Request for Information (RFI) will assist NIGMS in considering needs and opportunities in team science at the multi-investigator, institutional, regional and national level. Comments are invited from any interested party.

Hey, I'm an interested party, so let's do it. The RFI is broken into six areas, and I'll structure my post likewise. But first, a general overview of where I'm coming from (since I haven't posted anything else on this blog and you might not know).On the off chance that the NIGMS doesn't read this blog, I'll also email this to them.

The Short Version

The idea of team scienceThat is, big interdisciplinary groups working on large projects sounds great, but when it gets implemented there are some challenges. Most of those challenges boil down to how we publish science, how we evaluate scientists, and how we train students and evaluate their work. All of these issues deserves their own essays, but in short:

  • A whole lot of science gets done by trainees—post-docs and graduate students
  • Early-career scientists are evaluated on the basis of their personal accomplishments. Around here that almost always means your own project and a publication with your name at the front (or at the end, for new PIs).
  • Team science requires teamwork. It's really hard to collaborate in such a way that everyone involved can have their own publication at the end.

Because of those points, I don't think team science is a good use of funding. The goals of such a grant are not aligned with the interests of the individual labs. Unless universities and funding agencies are going to seriously change how they evaluate scientists, big teams don't really work in academia.

So, after laying down that thesis statement, I'll go over the sections of the RFI.

Interest in team science

Comment on the appropriateness and usefulness of team-based research to address particular areas of biomedical science. This may include comments on the relative importance of team science in your field and your own experiences with team science.

Not just appropriate and useful, I think team-based research is essential! There are so many open questions in biology that involve the intersection of many fields. The big challenge is in integrating the collective knowledge of so many different specialties.

I don't think that big grants given to specific groups is the best way to achieve that goal. In any funding system it's hard to pick winners—with a smaller number of large grants, that problem is compounded. The big-grant approach heavily favors the large institutionsLike UCSF, for one. We have a lot of these grants that have faculty studying a wide range of related topics and can come together to apply for major grants.

The most important thing the NIGMS can do to support team science is to set up and administer common resources. Rather than try to fund large consortia, build scientific clearing-houses that facilitate teamwork. And revamp funding criteria to reward scientists that play supporting roles on many projects.

Management and advisory structures in team science

Comment on the types of management structures within a project that would enable an effective team science program. This may include the organizational structure, leadership models, and use of advisory boards or external review groups. Comments on challenges and solutions for issues such as the training needed for effective team science leadership, approaches for maintaining communication within and between teams, and strategies for maintaining team effectiveness are welcome.

Alluded to in the previous section: the best way to manage and advise teams of scientists is to let them form on an ad-hoc basis, without management. Fund individual PIs on the merits of their own workI think the MIRA is a great start, including their service to helping other labs. If the NIGMS provides support to make team-based science efficient and rewarding, PIs will figure out how to do it on their own.

Team composition

This may include comments on recruiting team members, the importance of training students or mentoring junior PIs involved in team science, the value of diversity in team science, and the challenges of recognizing individual efforts on team-based research within a university or research institute setting.

As I said at the top, I don't think trainees and junior PIs can thrive as cogs in a huge machine. But in a decentralized system they can be cogs in a smaller machine: their lab. If it is possible to reach out to collaborators easily, and they have incentives for providing assistance, it should be possible for small labs to engage in team science without being lost in the crowd.

Resources and infrastructure

Comment on the resources and infrastructure that are needed to support team science, including teams consisting of groups from multiple institutions. This may include comments on technical and administrative cores, both those currently available at your institution and those that would need to be established to support team science.

This is where NIGMS can make the biggest impact, but not necessarily at the institutional level. Institutions should be able to facilitate collaboration within their own walls. What the NIGMS can facilitate is collaboration between institutions, by helping groups share results and feedback. One way to do that is to build common resources for sharing results, methods, reagents, and data. The NCBI does this for lots of computational dataalthough it could be much improved and they're an indispensable part of biomedical research because of it. Whenever the NIGMS identifies a problem worth of targeted grants, they should consider what common resources they can provide to the community, rather than assembling a dream team.

Collaboration is hard for much the same reason that team science in general is hard: it is difficult to align the interests of two research groups so that they work together effectively. I think the major obstacles is that the minimum unit of research that is useful to a scientist is the publication, and a publication is a large investment of time and effort. This means that the activation energy for collaborating is very high.I had more to put here, but the short version is: the farther you get from first/corresponding author, the less likely it'll be worth the effort, and so it's a lower priority. Thus there's a limit to the size of collaborations

If the NIGMS really wants to encourage team science in a systematic way, it needs to develop a way to track, quantify, and reward collaboration at all levels. Providing a small part of ten projects must be as valuable as spearheading one. And this contribution can't just be recorded in author lists, which are subjective and inaccurate. We need a better way to record who helped with what on which piece of research, so that we can evaluate and reward the scientists who make an impact in many different areas: the team players.

Assessment of team science

This may include factors that should be considered in the peer review of team science-based grant applications, the value of interim reviews during the funding period, the importance of outreach activities, and the appropriate quantitative and qualitative measures of the success and impact of team science.

This bit goes out the window if you take my advice and stop providing grants to giant groups of PIs. Reward individual PIs for their work, based on its scientific merit. And don't do it based on specific project proposals, because they never work on what they say anyway.

Comments on past or current NIGMS team-based programs and funding mechanisms

Comments are welcome on the advantages or disadvantages of programs and grant mechanisms such as the National Centers for Systems Biology (P50), Centers for Injury and Peri-Operative Research (P50), Centers for HIV/AIDS-Related Structural Biology (P50), COBRE Centers (P20), Glue Grants (U54), the Protein Structure Initiative (U54), and Program Projects (P01).

UCSF has/had two P50 grants that I'm somehow associated with: one for a Systems Center and one for a HARC Center. My impression of those grants is that they certainly funded a bunch of projects, but they weren't the proximal cause of any team science. PIs have areas of research and trainees need their own publications. Collaborations happen when they further those goals and not otherwise.

Many projects at UCSF involve multiple labs working together, but that tends to be determined (and should be determined) by the scientific questions, not by the funding source. If the P50 grants were replaced with stable funding for many labs, I don't think the outcome would be less collaborative, and the scientific output would likely be higher.

How is any of this different from the status quo?

I'm not sure! In the end, I'm suggesting that the NIGMS step back from these team science grants and focus on labs. I think the biggest things I propose are somewhat tangential to this RFI: fund labs rather than projects, reward small units of collaboration/supporting work, and good things happen.

But that's why I used this RFI as inspiration for a blog post in the first place: the questions NIGMS is asking were about the topics I wanted to discuss anyway.

Coming Soon (maybe)


I made this blog because some stuff doesn't fit in a tweet, so I should probably write something, huh? Here are the main things I want to write, eventually:

  • Something about publishing
  • Something about graduate education
  • Something about the research enterprise in general
  • (Maybe) something about statistics

If I ever write these posts, I'll update this one to reflect that.

Glasseye Upgrades


I fixed the way that Glasseye handles sidenotes and margin notes, so that they auto-hide (with a pop-up link) when the screen is too narrow to accomodate them. Job well done.

At this point my "Glasseye" implementation is not really similar to the original code, so I've created a GitHub project for this site. In case you wanted to see how the magic happens.

I imagine that at some point I may even break that plugin out into yet another repository, because there are still some tweaks I want to make…we shall see.

Introducing Glasseye, sort of


I like how this looks but I wanted more functionality—particularly the sidenotes from Tufte CSS.Look! A sidenote!

And yes, this is still a blog about making a blog. I'll get to other topics eventually.

Looking around for options, I stumbled on Glasseye, by Simon Raper, which looks really nice. He's combining Markdown, Tufte CSS, and d3.js in one package! Those are three things I like! And it's in Python!

After digging into it, I ended up with a fairly stripped-down version of his code, to the extent that I just stuck it straight into a Lektor plugin rather than using the Glasseye package itselfThat reminds me, I really need to put the actual code for this site up on Github…. For now I have jettisoned all of his (very nice) d3 charts, because I'm instinctively against the idea of letting the Man design my plots for me. I can make my own d3 charts, thank you very much.But they do look pretty snazzy. I just noticed that the sidenotes don't turn into pop-ups when the screen gets small, which is how they're supposed to work. I should fix that.Update: I did!

On Monday I visited with Lenny TeytelmanI interact with him mostly through Twitter so I might as well link to that. of, and we talked about a variety of stuff including the intersection of open-access science publishing and startups. One of the reasons I made this blog was so I could stick those ideas somewhere…maybe that'll be the next post.

Or maybe I'll change the font sizes and write about that. And I know I'm going crazy with the sidenotes, they're fun.

That was easy



The default style was kind of ugly so I decided to spruce it up with Tufte CSS which is exactly what it sounds like.

A step-by-step account:

  1. First I downloaded the CSS file and fonts from their GitHub page and added them to this project.
  2. Then I changed the layout.html file to point to it.
  3. That was it! Looks pretty nice without any modifications whatsoever. I may tweak it a little bit later.

I just retroactively lied and changed the date on this post, solely so that it would show up in the right order on the blog page. Sue me.

I'm not super happy with how it looks right now anyway—I may decide to get rid of the entire page. Or I just need to write longer blog posts. Luckily no one will ever read this.

Okay I guess I have a blog now


I've finally caved in to my own egomania and decided that everyone needs to listen to me!

Initial thoughts on Lektor:

  • Pretty easy to set up something that works, and I like that it can deploy to GitHub pages automatically.
  • Not as flexible as I would hope—in fact not very flexible at all. Maybe I will discover more flexibility later.
  • It seems like I pretty much have to nuke my previous page to deploy this but that's what git is for, right? Let's see how it goes.