Draft edited September 20, 2023 10:11 PM (PDT)
This hosts the provisional comparative evaluation of new web search systems with a focus on dimensions around integrating with users' curiosity-engagement, question-generation, response validation, search repair, search sharing, complaint and feedback, and other concerns at the core of my research.
The new attention to search, sparked by widespread interest in OpenAI's ChatGPT, has pushed many to develop new ways to search.
This evaluation is a look towards comparing different search systems, with a particular eye towards finding and shining some light on systems that are opening up search in new ways.
I have a wide range of criteria by which to provide some marks and remarks on these systems. I'll need to narrow them down and gradually work through them. I'm not thinking about them as goals, requirements, or desired-specifications, and some even may contradict. For some of the criteria I will provide citations as reference or support. Some criteria are drawn from examples in previous search systems (including shuttered, speculative, and experimental systems). My goal here is not to simply do an accounting of searching today, but to get some sense of where we might want search to go.
In this initial set of reviews, I’m focusing on these search engines, listed alphabetically:
These are my initial examples of new approaches to searching in generative web search systems. I may provide come contextualizing comments about other systems, like the explicit search-focused tools from Google and Microsoft, and chat-based systems like ChatGPT, Anthropic’s Claude, etc., that support search and search-like interactions.
To they extend that they support public-facing search, I will also be examining newer search libraries and services (including RAG frameworks), like the offerings from LangChain, from LlamaIndex, and Weaviate’s Verba, with comparison to (the also adapting) existing tools like those from Algolia and Elasticsearch.
My criteria are broad. I’m focused on concerns my research and training best prepares me to engage with. These are broadly questions related to the explicit and implicit articulation of the search system, the interactions around queries and results, the ability to share the burden of search, and the formalized methods of complaint. I’ll do some explicit evaluations of atomic performance related to “hallucination” or “groundedness”, but my focus is more on how people perceive and perform-with tool outputs than the outputs themselves. How are the searchers ushered into their searches? What do they see as searchable? How can they engage with search results (or responses)? Are they expected to vet the responses for hallucinations? How is automation bias addressed? What post-search activities are supported by the search system itself?
I’ll ask about features or uses that might perhaps be refused or reimagined, while situating this period of search amidst a longer history of search. There are importance concerns about misleading results, sources of training and reference data, oversight, and the future of work. I’m very much developing these reviews to acknowledge that these systems will keep changing. Where there are very important concerns that I am less well-versed in, like accessibility, I will leverage other resources.
What searches can we avoid doing?
What newly / more easily think to do?
What make newly possible?
What can be slower or faster?
Seamful? Viscid? Vetted?
Doubtful & deliberate?
Ephemeral or persistent?
Memorable? Public? Shared?
Surfing or blazing?
Embedded? Loosely coupled?
Fun?
This is not intended to be an introductory guide to these systems, but focused on making sense of what new search tools are providing and what they might become. These reviews may be useful to heavy users, developers, and others looking to understand changes in system support for various searching practices.
I will largely be looking at systems for web search, including those more focused to particular subject areas. Though important, these reviews will not (yet at least) engage with new search systems for:
I will not be trying to replicate metrics like that in ragas (tagline: “Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines”; from Exploding Gradient). Those metrics are intended to examine: Faithfulness, Context Relevancy, Context Recall, Answer Relevancy, Aspect Critiques. I’m more interested in actions outside the RAG-pipeline itself. What can be done to improve coordination between searcher expectation and system performance? What can be done to remedy system failures at searching-time?
I will also not be focused on in-editor code generation tools (like GitHub’s Copilot) and writing tools (like Lex.page) that replace or subsume some searching tasks.
I will not very focused on various metrics related to speed, unless it is very noticeable in frequent use.
I am concerned about questions of bias, but here only insofar as these systems are markedly different from the prior problems found in search.
I am not focused on explainability or transparency of these systems, though some question will definitely engage with those questions. I will be more focused on examining questions around seamfulness, tractability, and traceability. I will be thinking about how practical algorithmic knowledge [@cotter2022practical] is built up and valued.
I’m less focused on responding to or rehashing and regurgitating arguments about “model collapse”, than perhaps looking at how these search tools and their users imagine supporting or working towards unsealing knowledge, whether through articulations that help users doubt & dig deeper, providing multiple drafts, or RAG adaptations.
The most important work would be work looking at how these search systems and tools are imagined and used (or not) by other people. I am not looking at that right now, but I will look at aspects of the systems identified publicly by different users or others.
This is presently very much under construction, much more a speculative prototype than a full-fledged system. A proper system will require many more people working together. I am not proposing I have the “solution” or could build out a framework like this alone. This is just part of my joining the conversation about what search might be.
Add to system pages (background, reference data, reflections); scraping; interviews and guest commentary; glossary; reading list; suggested class exercises and discussion questions; add search functionality.
This project is not externally funded. Daniel Griffin, Ph.D., is pursuing this work in the course of his research and job search.