Blog / January 17, 2025

Understanding the Technical Components of Moveworks’ Live Search

Yiyuan Zhang, Senior Manager, Engineering

Charles Meng, Senior Product Manager

As part of Moveworks’ Enterprise Search, we are excited to introduce Live Search – with Live Search, Moveworks’ agentic RAG capabilities leverage various platforms’ native Search API to help employees get real-time answers and find information from more internal applications than ever before.

In this article, we describe the advantages of Live Search, specifically on content scale, data freshness, permissions-support, and security.

We also describe the built-in challenges of using Search APIs compared to traditional indexing methods, and how we implement state-of-the-art optimizations on retrieval, search ranking and Agentic RAG to dramaticaly outperform their out-of-the-box search performance.

What is Moveworks Live Search?

Moveworks Live Search offers an additional method of performing information retrieval, complementary to Moveworks' index-based search capabilities.

With Live Search, at the moment of a user's query, our system queries directly against your application's environment, say Google Calendar or Salesforce, using the platform's native search APIs. This approach contrasts against index-based search, where the search system queries against a mirrored copy of your application's data that has been ingested from your platform.

Say an employee searches for something, like "What meetings do I have next week?" Instead of querying against an indexed copy of the employee's calendar, our system queries for information using Google's calendar search API.

Live Search’s advantages and challenges

As described above, Live Search leverages various integrations' Search APIs. This approach comes with various advantages and challenges.

Advantages of Search APIs
The key advantages of using Search APIs are on scale, data freshness, access control permissions, and security.

Scale: With the amount of content that enterprises have, indexing everything can impose significant costs on processing and storage. Live Search complements the millions of files our search infrastructure can support, to enable Moveworks Enterprise Search to offer hyper-efficient coverage of all internal content in these systems, even at the world's largest enterprises.
Data Freshness: While our indexing pipelines are capable of syncing with up to order-of-minutes freshness, Live Search allows content available in Moveworks Search to constantly stay in-sync with what's in your system itself. For an integration like Google Calendar, you can expect changes like adding a new calendar, changing meeting participants, etc. to reflect almost instantly in Moveworks search. Access Control Permissions: In parallel to our offering that supports index-based, "mirroring" ACL permission support for our indexed integrations (Sharepoint, Google Drive, Box, Confluence, Dropbox, ServiceNow, etc.), all new Live Search integrations always respect users' access controls by leveraging their personal user tokens.
Security: Your enterprise benefits from elimination of data redundancy. While indexing requires content to be ingested and stored in an external search index, creating additional duplicates of your data, Live Search operates without having to duplicate all your enterprise content.

Challenges of Search APIs

However, using Search APIs can come with disadvantages – the two main drawbacks of Search APIs, as compared to indexed search, is that Search APIs are generally:

Less control over speed: Deferring retrieval to an external system's API surrenders some degree of control over how fast the query can execute
Limited to naive keyword retrieval: Most Search APIs are available out-of-the-box with very limited keyword search. To put into perspective the search limitations, each platform's search API is about as good (or bad) as its native search bar – just recall the last time you tried to find information in Slack, Sharepoint, Confluence, etc.

We discuss next how we designed our Live Search to capitalize on the advantages, while delivering on speed and search quality.

How we built Live API Search

To maximize the benefits of real-time API access, we needed to mitigate its natural challenges on search quality and speed. Let’s dive into how we do it, with five main areas of investment:

Query Understanding
Integration-Specific
Resource Retrieval
Ranking
Summary Generation

1. Query Understanding

Goal: Understand user context/intent and optimize retrieval

With Search APIs, you’ll find that naively sending a user’s natural language query will often miss the mark, yielding little to no relevant information. This is largely due to the significant limitation of Search APIs, which generally operate as basic, keyword-only search systems. For example, try asking a question in your Slack, or Sharepoint search bar, similarly to how you would to a fellow colleague or expert subject matter. It likely won’t work.

With Moveworks Live Search, we’ve invested heavily in query understanding to completely outclass the out-of-the-box search quality of Search APIs, and each platform’s native search bar.

The goal of query understanding in an enterprise search system is to accurately interpret a user’s intent, translate it into actionable search operations, and efficiently deliver relevant, precise, and efficient results.

We have implemented a set of optimizations that allow our system to “understand” all the components of the user’s query. This includes spell-checking, stemming and lemmatization, entities and synonyms expansion for domain-specific and organization-specific technologies, as well as extracting key facets, like document type or date, from the user’s query.

Let’s take an example: Say a user requests “find me documents on ACLs from Yiyuan”. Leveraging our knowledge of internal entity names and org-specific terms, we understand that “ACL” is an acronym “Access Control Lists”, and described colloquially as “permissions support.”

We’ll also extract filters provided inside the query – in this case our system can understand that “documents” are the desired content type of search results, and that of those documents the user expresses a preference for ones authored by the user “Yiyuan Zhang”.

Later in the ranking step, this type of upstream understanding to extract “documents” allows our system to prioritize searching against “document-centric” integrations, for example Google Drive, Sharepoint, and Confluence. The upstream extraction of “Yiyuan” as an author allows our downstream ranking to prioritize specifically for content authored by Yiyuan.

As part of understanding a user’s intent, we also reference key information we know about the querying user and their past activity. This can include information such as the user's role within the organization, their department, past search history, location, and more.

2. Integration-specific Query Rewrites

Goal: User context incorporation

To specifically address the limitations of search APIs as naive-keyword retrieval systems, we have implemented “agentic” capabilities during the retrieval stage in our search system. Using the context of the user and their search query, our system performs integration-specific query rewrites of the original query. During this step, our system generates new queries, in addition to the original query, to interact more effectively with keyword-centric search APIs. The integration-specific rewriting dramatically improves recall by adapting the query structure to match the specific syntax, filters, and parameters required by each system.

Using the previous example, while the original query was only “find me documents on ACLs from Yiyuan” the search itself may execute with multiple, contextually-relevant queries such as “access controls”, “ACL technical design documents documents”, “access control list information”, “permissions designs”, “search permissions support”, and more. Each rewrite is intended to interact better with the keyword-centric search APIs, as well as collectively spread a wider net during retrieval.

3. Resource Retrieval

Goal: Improve search speed and handle scale

This part of our system mitigates the speed concerns associated with Live API Search and facilitates the retrieval prepared-for by the query-understanding stage.

We have built a robust resource retrieval system designed to efficiently handle multiple parallel calls across various APIs, ensuring high throughput. The system intelligently selects which APIs to query based on the results of query understanding, optimizing system load and improving latency. It seamlessly retrieves data from both indexed sources and live APIs, delivering fast and comprehensive access to all your content.

4. Ranking

Goal: Serve relevant, up-to-date, diverse, and personalized results

Once we have retrieved results from the Live Search systems, we blend the results together in a series of re-ranking steps, in order to present the best possible options from not just our Live Search results, but alongside indexed-search results from our search indices.

On ranking, we have built a multi-stage search system that integrates traditional information retrieval signals like BM25 with our in-house bi-encoder and cross-encoder models to deliver highly accurate and relevant search results with minimal latency.

Our ranking system also personalizes and prioritizes search results based on context we know about the user. This can include information such as the user's role within the organization, their department, past search history, geolocation, and more. From the upstream query-understanding steps, our ranking system also dynamically applies “soft filters” based on those extracted from the user’s query, to better serve results that align with the user’s intent, such as for a specified date range (“last week”), a specified author (“from Yiyuan”), or a specified integration (“from Salesforce”), or resource type (“find me documents”).

Additionally, we prioritize content recency, ensuring users receive the most up-to-date information, while also maintaining result diversity to provide a broad range of relevant content.

This multi-stage approach ensures both high accuracy and low latency, offering fast and comprehensive access to all pertinent information.

5. Summary Generation Optimizations

Goal: Generate trustworthy, transparent answers

Finally, on the LLM-generation step for both the search experience in Assistant and the Summary on Search Mode, our system leverages advanced summary generation techniques, utilizing map-reduce to efficiently handle large volumes of data and produce concise, accurate summaries. This ensures that even in complex enterprise environments with vast amounts of unstructured data, users receive clear and actionable insights.

To enhance transparency and traceability, we include citations within the summaries, linking back to the original sources of information. Throughout the summary generation process, our system incorporates a series of fact-checking guardrails, cross-referencing data across multiple sources to verify accuracy and ensure the information is reliable and trustworthy.

This combination of cutting-edge technology and rigorous validation allows users to confidently rely on the generated summaries for informed decision-making.

Live Search: A new milestone in Moveworks Enterprise Search

With the right combination of query understanding, retrieval, ranking, and summary-generation optimizations, Moveworks has built a Search product that maximizes the benefits of Live Search APIs on scale, data freshness, security, permissions, and cost. All while exponentially outperforming the out-of-the-box abilities of Search APIs on search quality and speed.

Our optimized approach to Live Search ensures you get accurate, timely, and secure search results, all while managing limitations effectively. It’s an offering and technical feat that’s unique in the market today, and a component we are proud to bring as part of Moveworks Enterprise Search.

Help employees find the right information faster, learn more about the power of Moveworks Enterprise Search for yourself.

Table of contents

This posting does not necessarily represent Moveworks’ position, strategies or opinion.