Epistemic Status: medium-high, the original paper is written by a seasoned computer scientist, I have received validation from experienced researchers on this approach, I have personally found it transformative
Epistemic Effort: medium, the main ideas are distilled from the paper, with some modifications and extensions
I asked Notion AI to write a poem about this post…
When research papers pile up high,
And overwhelm the searching eye,
A wise methodology is key,
To help us learn and prioritize effectively.
So many research papers…
The body of knowledge that sits in published research is mind-boggling and for those making their foray into literature review, quite demoralizing.
The volume, velocity, verbosity, and vexing vernacular of these papers can overwhelm the individual without a wise methodology to discern, prioritize, understand, and digest them.
In this blog post, I’ll explain how you can manage the challenges of processing research literature using Srinivasan Keshav’s three-pass approach, as well as some additional commentary on my own experience using it for literature review.
Keshav's Three-Pass Approach
Reading research literature without a methodology is like trying to navigate a dense forest without a map or compass. You may be able to find your way eventually, but it will be a slow and unnecessarily challenging process. You might pay attention to useless details and ignore the substance.
A simple methodology, like Keshav's "three-pass approach," is like having a well-marked trail. It allows you to navigate the terrain more efficiently, avoid getting lost, and ultimately reach your destination with minimal effort.
Keshav’s three-pass approach is quite simple.
Pass One (5-10 minutes): During the first pass, the reader conducts a high-level skim of the paper to determine if it's worth their time.
Pass Two (1 hour): During the second pass, the reader conducts a more detailed study of the paper, focusing on understanding its content, arguments, and supporting evidence.
Pass Three (4+ hours): During the third pass, the reader replicates the author’s work to validate assumptions and results, gain a better understanding of innovation, and ideate ways to expand on the work.
In Keshav’s words,
“The first pass gives you a general idea about the paper. The second pass lets you grasp the paper’s content, but not its details. The third pass helps you understand the paper in depth.”1
Let’s dig into each pass.
First Pass
When beginning to read a research paper, it is naïve to dive right into reading the entire document. That's where the first pass comes in. The purpose of the first pass is to quickly evaluate whether or not the paper is worth investing your time and energy into reading.
It’s like deciding if you should dine at a restaurant. You scan the menu before going in, and if you like what you see, you walk in.
Here are some steps to follow when conducting a first pass:
Step 1: Skim the Paper
The first pass is not a read of the paper. It's more of a bird's eye view. You want to get a high-level understanding of what the paper has to offer. The entire purpose of a first pass is to answer one question:
Do I want to invest time and cognitive energy into reading this paper?
When you skim the paper, only read the title, abstract, introduction, section headings, and conclusion.
Step 2: Consider the 5 C's
In his paper, Keshav suggests the reader considers the 5 C's: category, context, correctness, contributions, and clarity. Here's what to look for:
Category: What type of paper is it?
Context: What is the background of the research?
Correctness: Is the research methodology sound? Are there any flaws in the research design?
Contributions: What are the main findings of the research? What new knowledge does this paper contribute to the field?
Clarity: Is the paper well-written? Is it saturated with confusing verbiage?
Step 3: Decide Whether to Continue Reading
After considering the 5 C's, it's time to make a decision: Is this paper worth continued reading? If the paper falls within your area of expertise, is well-written, and offers valuable contributions to the field, it may be worth investing more time into reading. However, if the paper is trivial or boring, or if it falls outside of your area of expertise, it may be best to move on to another paper.
Tip: Batch First Passes in Sprints
To be efficient and save time, consider batching first passes in sprints. Run through 3-7 papers in one sitting, rapid-fire, all related to the same domain. This can help you quickly evaluate multiple papers and determine which are worth further reading.
Revisions
Personally, I’ve made some revisions to these guidelines, so I don’t implement this verbatim. In my experience, the contributions and clarity are what I really pay attention to. I don’t think a first pass is enough to judge correctness (which I assess in the second pass).
My decision on whether to continue reading also depends heavily on if it is immediately relevant to my learning goals (see the parsimonious learning principle later). If I am looking to improve my cardiovascular fitness, I have to believe the insights of a paper are relevant to that learning goal to consider it for a second pass.
Second Pass
Now it’s time to actually read the darn thing. The second pass of a research paper involves a more detailed study of the paper, focusing on understanding its content, arguments, and supporting evidence. During this pass, you should pay more attention to diagrams, figures, and illustrations and make notes of unfamiliar references that you might want to read later.
Be sure to take notes in the margins:
If a passage is unclear, take the time to parse it and rewrite it in your own words.
Define unfamiliar terminology.
Make note of weaknesses or cleverness.
Ideas on how you might implement this in your own work or life.
If the paper is mathematical in nature, I think proofs, algorithms, and notation can be ignored in most cases, unless your goal is to understand the machinery. In most second passes, the objective is to understand the thesis of the work, its supporting evidence, and how the authors conducted the study. Of course, this objective is generic and does not apply to all situations.
Third Pass
The third pass is the most time and effort-intensive pass. This pass involves replicating the author's work, which can serve one of many goals, including:
Validate assumptions and results
Gain a deep understanding of the innovation
Conduct research to expand on the work.
Implement the paper to solve a business or personal problem
“By comparing this re-creation with the actual paper, you can easily identify not only a paper’s innovations, but also its hidden failings and assumptions.”2
Due to the time and energy costs of third passes, you will (and should) rarely do them, if ever.
Scoping a New Domain Using Keshav’s Method
In some cases, you want to read research literature to make a foray into a new domain. This situation presents additional challenges:
Deciding which papers should go into the inbox
How to judge a paper’s contributions without enough background knowledge to make a sound judgment
Papers are not readable without a basic mental model of the key concepts and terminology
In this case, the learning goal is to scope the state of research in an unfamiliar domain. I’ve applied the three-pass approach to this problem quite successfully (e.g. LLMs), but I had to make a few addendums.
Addendum 1 - Prioritize Survey Papers
Your first sprint should prioritize papers of the type “survey” that are recently published. You can search for these on an academic search engine like Google Scholar. These papers are written by seasoned veterans in the research domain and are meant to summarize the history of the field, the current status of research, and open questions. They provide insight into key concepts you need to know, the different research factions, the different paradigms, etc.
The goal of a first pass in this case is to identify which of the papers will likely give you the best introduction to the domain. These are longer papers, so try to narrow it down to one.
The purpose of the second pass is to build up a mental model of the domain so that specific papers have a place to land (intellectually).
Understand the historical context. You should add the landmark papers to your queue for the next sprint.
Make note of the jargon. Define recurring unfamiliar terms in the margins.
Build an ontology (I personally use LucidChart to do this). How are the key concepts related?
What are the big questions that need answering? These questions will help you contextualize the contributions of papers that you read next (papers that help answer those open questions are of greater importance, and should be prioritized).
Once you have a lay of the land, you can determine which area to dive into deeper.
Addendum 2 - After Survey Papers, Prioritize Meta-Analyses
Once you are armed with a mental model, it’s time to parse through the research-heavy literature. However, before collecting studies, I would next prioritize meta-analyses.
Think of a meta-analysis as a study of studies, or a way to combine information from multiple studies. The goal is to produce one overall estimate of the size of some effect being studied. This estimate is like an average of the effects found in each study, but the size of each study's effect is weighted differently.
The problem with reading one or two studies is that their results might be spurious. Publication bias leads to false positives, so replication of studies is a vital epistemological best practice. Rather than putting all the effort into synthesizing these results yourself, turn to a meta-analysis that does it for you in a rigorous fashion.
When it comes to meta-analyses, the goal of a first pass is to identify which ones are relevant to the specific variable of interest. If I am interested in understanding how I can use exercise as a tool to improve sleep, I want to find the meta-analysis that will reveal the greatest insights on that particular relationship. If the decision is tough, use the publication date as a tie-breaker (pick the more recent one).
The goal of the second pass is to understand the consensus or the lack thereof. Perhaps there is too much contradictory evidence, and it’s too early to call, but that is still useful to know.
Addendum 3 - Identify Top Conferences and Journals
Once you have scoped the domain with a survey paper and understood the current consensus on important research questions, you might be done. Maybe you just wanted some familiarity with the current status of the domain. I would call this a “Pareto dive” into an entire research field… a survey paper and some meta-analyses.
But maybe your learning goals require a deeper dive, and you’ll need to start processing recent studies. We are back at the velocity and volume problem… there is so much literature released every month, in every domain. Can we apply some simple heuristics to narrow this space down further?
Yes! A very effective way to filter information is to restrict your information diet to publications in the apex journals and conferences since these are heavily peer-reviewed. A peer-reviewed paper gives a signal that other experts in the field looked at that paper and decided, “Yep, there is something worth sharing here.”
You can simply ask someone with expertise in the field for those top journals/conferences, or you might find that information online.
Once you have a short list of the top conference and journals, start reviewing recent publications that are relevant to the particular area you are interested in, and add those to your inbox. As always, rely on references in the papers to guide you to other papers worth reading, especially if the paper has a high citation count (an indicator of impact).
I should add a caveat that I am not giving peer review the stamp of approval. Peer-reviewed literature has plenty of problems arising from incentive structures and gate-keeping structures (e.g. a handful of people control the paradigm of a research area, even if the paradigm is wrong), but it’s a good filtering heuristic to rely on for scoping a new domain.
Principles
Be Harsh
The first principle is to be harsh.
If at any point you start to doubt whether a paper is worth your time, toss it!
If you are hesitant about whether to promote a paper to the next pass, toss it!
Parsimonious Learning
The second principle is to be parsimonious.
An aphorism I rely on for autodidactic endeavors:
To learn something is to not learn something else.
Attention is currency. Every minute of attention you pay to read a paper is a minute of attention not paid to reading another paper (which may be far more substantial).
Focus on just-in-time information, not just-in-case information.
There are numerous papers out there that are very interesting and will pique your curiosity, but it is essential to remain focused on what is actually relevant to your current needs. By being selective with the papers you read, you will be able to devote your full attention and energy to the ones that truly matter.
The first pass gives you a sense of what is out there; you learn what you don’t know.
The second pass is when you really want to understand a paper, but this should only happen if that paper is immediately relevant to some learning goal.
The third pass is only necessary if the research needs to be implemented to solve a problem or if you are conducting research on the same topic.
On arXiv, Google Scholar, and Papers With Code
When looking for papers to add to your inbox, be cautious with Google Scholar and arXiv. Anything can end up here, including papers that have not been peer-reviewed. It is generally better to look at conference proceedings and journal publications.
However, the pre-print versions of papers on arXiv may contain more information than conferences, since conferences have page limits, so it might be worth looking at the arXiv version in your second or third pass as the appendix can contain useful information.
The limitation of peer review is that it takes time, so there is a nontrivial lag between when research is conducted and when it is available for reading. If you are a practitioner in a fast-moving field (like AI), it can be advantageous to look at pre-prints on arXiv to get a sense of what is really the bleeding edge. Just maintain the context that there might be mistakes or serious methodological errors.
Now I am going to switch gears to research domains in computer science (so if that’s not your thing, skip this part). Papers With Code is gaining popularity… and I love the idea. I think it’s a great movement to share both code and data. But practically speaking, the code can tend to be a distraction. Researchers are not the best programmers. Don’t spin your wheels trying to understand these Jupyter notebooks. If you are looking for an implementation, and there is no well-designed implementation in a library, then consider making an open-source contribution to the most relevant toolbox (leverage code review and community calls).
Conclusion
I’ll conclude with an important caveat to make: the three-pass approach is a general-purpose approach to information processing for research papers, but it is not universally applicable. Of course, there are circumstances where this methodology does not make sense. And of course, everyone will modify the workflow to suit their needs.
If you have a specific question in mind, just look for what is relevant. Sometimes you are looking for data, and sometimes for a model, and sometimes for a relationship. What you pay attention to depends on what questions you are trying to answer.
That being said, the three-pass approach is simple, effective, and used by both beginners and experts. I’ve talked to several seasoned scientists and a majority follow this approach or something similar.
In this post, I’ve introduced the methodology abstractly, but I also introduce a concrete systems implementation in a follow-up post.
References
Keshav, Srinivasan. (2007). How to Read a Paper.
Keshav, Srinivasan. (2007). How to Read a Paper.