Taking Information Search in Notion to the Next Level
Adding a semantic search layer to Notion databases with Notion AI
Epistemic Status: high, results are clearly better than status quo search
Epistemic Effort: medium, tested design for a couple months prior to writing
Imagine being able to effortlessly find the most relevant information from your vast collection of media notes and writings stored in a Notion database. The current paradigm is to create database properties to search against, but this has two serious flaws:
Friction and maintenance. There are potentially many dimensions to search against, and that just means more properties, more effort in defining document metadata, and more friction in processing information to enable searchability (e.g. thinking about the correct tags to assign to a document).
Poor search results. You are not guaranteed the best results. Metadata search relies on exact match, which is sensitive to synonyms and polysemy (words can have multiple meanings). You might get documents that are not actually relevant, or omit documents that are relevant.
In this piece, I’ll reveal a new Notion AI design pattern that addresses these problems. I work on AI systems, and lately, I’ve been paying more attention to streamlining information processing with AI, and the modern approach to state-of-the-art information retrieval systems (e.g. search engines) inspired me to replicate this in Notion for personal use. I’ve been using it for a few months to verify its potency, robustness, and applicability, and I can confidently say it’s one of the more exciting and impactful designs I’ve implemented. I hope readers find it useful :)
P.S. Design patterns require intermediate system proficiency (reference to my post on system proficiency levels), so if you are still learning the basics of Notion, this read might be too advanced. For those interested in systems design and thinking, this post is for you!
Information Processing Systems
Let me first provide some context on where this all fits in the broader picture. Information processing involves identifying relevant information from various sources, such as the Internet, and then processing it by extracting and structuring the salient information. This processed information is then stored for future access when needed. Information processing encompasses both indexing and search strategies.
The indexing strategy focuses on storing concise and useful representations of the information you have interacted with. The goal is to make this information easily searchable.
On the other hand, the search strategy is about finding the most relevant information based on the current question or challenge. The technique at the center of this blog post is an advanced approach to optimizing the search strategy and it is considered the state-of-the-art method for searching information. Search engines like Google uses it, recommendation systems like Netflix uses it, social media feeds like Instagram uses it, etc.
When designing information processing systems, I first look at how the big players do it. You know, companies that make money by offering the best information processing services. That’s obviously super technical and will fly over the heads of most, which is why I like to take what I learn about those systems, replicate the ideas in my own personal systems, and communicate them in plain English for everyone else and/or create a template system that others can duplicate.
AI Meets Knowledge Management: Semantic Search
Alright, it’s time to introduce the magic–semantic search. To understand it better, let's take Google search as an example. Google search is an interface that allows users to submit queries and quickly access the most relevant internet pages out of millions of existing pages. It’s fundamentally an information processing problem.
Historically, these kinds of systems relied on keyword indexing. Keyword matching poses challenges, such as the synonym problem. For instance, if you search for "heart disease," documents discussing "cardiovascular disease" may not be found, even though they refer to the same concept.
Additionally, words can have multiple meanings or interpretations, known as polysemy. For example, the word "blue" can refer to the color or someone feeling sad. Context is necessary to understand the intended meaning. A traditional keyword matching system would return all documents containing the word "blue," leading to noise in the search results. The signal gets diluted.
Searching based on metadata or keywords is inefficient for information retrieval. This is evident in how Notion databases work. They allow searching based on properties (document metadata), but you have to deal with the problems I’ve previously mentioned around effort in assigning metadata and poor results.
With semantic search, the objective is to find documents that are semantically relevant to the query, rather than relying on keyword matches. It transitions away from symbols (searching for letters/numbers, e.g. is “blue” in “the boy was feeling blue”?, the answer is yes) towards meaning (searching for concepts/ideas, e.g. is the idea of sadness in “the boy was feeling blue”?, the answer is yes).
Recent advances in AI have made semantic search possible. It enables comparing queries to documents based on their semantic meaning to determine relevance. Semantic search addresses synonyms by understanding the intended concept. For example, searching for "cardiovascular disease" would also retrieve documents about "heart disease." It analyzes your query and understands the intended meaning based on context. For example, if you use a phrase like "the boy was feeling blue" it will recognize that "blue" refers to a feeling state, not the color, and it returns documents that talk about boys feeling sad (semantics) rather than documents with the words “boy” and “blue” (keyword match). This allows for a more accurate search based on semantic interpretation.
Implementing Semantic Search in Notion
Implementing semantic search in Notion requires the use of Notion AI properties, which are available with a Notion AI subscription. These properties have access to the content within Notion documents and can provide summaries or answer questions based on the document's content. By setting up a Notion AI property specifically to search for a particular concept or topic, you can retrieve documents that discuss that concept without relying on extensive metadata.
Here is the recipe for adding a semantic search layer to your Notion database:
Create a Notion AI property of “Custom autofill” type.
Write a query template that asks the AI to determine whether the document content is relevant to the question or topic.
In the query, specify how you want the AI to respond for both cases (relevant or not relevant).
Create a filter for the Notion AI property that removes all entries where the AI classified the document as irrelevant.
When searching for a new topic/question, modify the AI script by adding your query, and rerun the property for all documents.
Test the script and iterate until you get quality results.
For example, if you have a massive Notion database with book notes and you want to find books that mention or discuss the concept of "Innovation," you can set up a Notion AI custom autofill property to search for that specific concept within the document content. Tell the AI to respond “N/A” if the document is irrelevant. Then, you can create a filter to retrieve the responses where the concept is mentioned. Running this AI property with the query over all documents and adding a filter will return all the documents that talk about “Innovation”, and you don’t have to worry about tagging documents with “Innovation” manually.
Here is the template I am currently using:
Topic: {TOPIC}
If this topic is mentioned, summarize the insights in 3 sentences.
If not, tell me EXACTLY the following: "N/A".
I have three filters to exclude pages where the AI search contains “N/A”, “not discussed”, or “not mentioned”. Here is an example of a query for the topic of “Discomfort.” I have many podcast episodes in this database, but I am able to condense the list to two episodes that touch on the idea of discomfort.
You can experiment with different templates or query structures to test the quality of the results. It's important to check if the AI can detect synonyms or polysemy. This way, you can evaluate the effectiveness of the query you've added to the Notion AI property. Testing is necessary, as the quality of Notion AI can change over time. In the past, I've had queries that suddenly stopped working and had to be tweaked. So, it's an iterative process of testing and refining.
Ultimately, you want to specify that you're looking for content related to a specific concept and define the desired response if the concept is present or absent. It's crucial to pay attention to the structure of the response from Notion AI because it will be used for keyword searching.
This semantic search pattern simplifies information retrieval, allowing you to focus on content rather than metadata. It overcomes the challenges of synonyms and polysemy associated with keyword-based searching. By streamlining information retrieval in Notion databases, you can unlock the knowledge captured within your documents.
Use Case: Podcast Information Processing
I want to talk about a use case for Notion semantic search that I've been using. It has been a game changer for me in processing and searching my podcast notes. Podcast processing has been a challenge for me in the past. I have been fairly good at information processing for books and articles, but I have struggled to find a good workflow for processing information from podcasts. Now, it’s my most effortless workflow.
The indexing strategy of this use case is not related to Notion. There is an app called Snipd that uses AI to transcribe podcasts and allows you to snip and save parts of the podcast that you find important. When you snip a segment (1-2 minutes), the app identifies a relevant range in the transcript and generates key takeaways using AI. It also generates a title for the snip and allows you to assign your own tags or add notes (I usually let the AI take the notes).
Each podcast has an AI-generated summary of all your notes, highlighting the main ideas you snipped. So Snipd helps with information indexing by automating the identification of salient information to represent the main ideas from that segment of the podcast.
The next step is deciding where to store this information. Ideally, you want to store it where you can easily search for it. This is where Notion comes in. Notion is where I store the podcast snips for future retrieval. While Snipd itself provides search functionality, I prefer to have all my media notes in my Notion system.
There is an integration between Snipd and Notion, so Snipd can automatically create a new entry in your podcast database in Notion. This entry includes the contents of your AI-generated summary, including the main ideas you snipped, the time frame of the snip, the AI-generated title, the associated tags, and the transcript. It also includes the key takeaways generated by the AI for that snip.
Here is an example of my snips from Huberman’s interview with Tim Ferris.
The great thing about this setup is that you can easily find podcast episodes relevant to a specific topic or question using semantic search in Notion. You don't have to rely on metadata or manually assigned tags.
It just blows my mind how this information processing workflow requires basically no effort. By double-tapping my left AirPod whenever I come across an interesting idea, Snipd creates a snip, and the rest of the process is automated, including notetaking, title generation, the integration with Notion. I only ever interact with Notion when performing a semantic search.
Holy Grail: One Database, One Semantic Search Layer
Ideally, all of your media notes should be stored in the same place. When using Notion databases, properties are often utilized for search purposes. While there are other use cases, such as status management, properties are primarily used to enable search based on metadata. Once you have enough content in the document to perform semantic searches, the need for properties diminishes.
Imagine a media database that integrates with various podcast apps, article and blog note platforms, as well as book note platforms. These integrations populate the database with content from different media types, reducing the reliance on metadata. At this point, all you need is a way to search through the relevant information extracted from all media types, regardless of their sources. This simple and effective system for information processing can be achieved by consolidating all notes from different media types into a single database and implementing a semantic search layer.
I plan to explore this approach in the future as I consolidate my notes from different media types into one database and apply a semantic search layer on top of it. This will provide me with the necessary tools to search for relevant information based on my questions or topics of interest. It doesn't matter where the information came from; what matters is that it is present in the document.
Conclusion
In summary, the key steps to optimize information processing in Notion would include:
Centralizing all notes and content from various media types into a single database within Notion.
Implementing a semantic search layer using Notion AI properties to enable efficient searching based on the content itself, rather than relying on extensive metadata.
Continuously refining and testing the semantic search queries to ensure accurate and relevant results.
This design pattern makes information search in Notion more manageable and powerful, thanks to modern AI. I recommend getting the Notion AI subscription so you can implement patterns like this one. It will save you a lot of time and is definitely worth the investment. Semantic search is just one of my many use cases for Notion AI. It has a lot of potential, and I'm excited to explore it further. I hope this helps you be more efficient in searching for information in Notion databases, and let me know how it goes!