Back to OmniBlog

The role of large language models in search


November 28, 2023




A classic problem in artificial intelligence is to develop a machine that will understand human language. For example, when we search for "Italian restaurants near us" on our favorite search engine, the algorithm must analyze each word in the query and return relevant results. 

Language understanding, language generation tasks and more fall under the subfield of machine learning known as Natural Language Processing (NLP). Advances in NLP have led to a wide range of practical applications, from virtual assistants like Amazon's Alexa to spam filters that detect malicious email. The latest development in NLP is the idea of a large language model, or LLM. LLMs like GPT-3 have become so powerful that it seems to be successful in almost any NLP task or use case. In this article, we'll look at exactly what LLMs are, how these models are trained, and their prospective applications.

Understanding Large Language Models 

At its core, a language model is an algorithm that knows how likely it is that a string of words is a valid utterance. A very simple language model trained on several hundred books should be able to say that "He went home" is more valid than "He went house". If we replace the relatively small data set with a huge data set from the Internet, we begin to approach the idea of a large language model.

To train LLMs researchers apply deep neural network models to large amounts of textual data. During the pre-training phase, LLMs familiarize themselves with existing textual data to learn the general structure and rules of the language. Because of the amount of text data the model has learnt, LLM becomes very good at predicting the next word in a string. The model becomes so sophisticated that it can perform many NLP tasks like summarizing text, creating new content, and even simulating human conversation. 

Over the past few years, LLMs have been pre-trained on data sets that cover a significant portion of the public Internet. For example, the GPT-3 language model was trained on data from the Common Crawl dataset, a corpus of web posts, web pages, and digitized books collected from over 50 million domains. The GPT-3 training involved more than 175 billion parameters and was the most advanced language model until recently (now it is GPT-4). It can generate working code, write entire articles, and attempt to answer questions on any topic. 

The massive data set is then fed into a model known as a transformer - a type of deep neural network that works best for sequential data. Transformers use an encoder-decoder architecture to handle input and output. In essence, the transformer contains two neural networks: an encoder and a decoder. The encoder can extract the meaning of the input text and store it as a vector. The decoder then receives the vector and produces its interpretation of the text.

The key concept that enabled superior transformer architecture is the addition of a self-attention mechanism. The concept of self-attention allows the model to uncover the most important words in a given sentence (the mechanism sequentially considers the weights – distances between words). Another advantage of self-attention is that the process can be parallelized - instead of processing sequential data in order, transformer models can process all inputs at once. This allows transformers to train relatively quickly on large amounts of data compared to other methods.

After the pre-training phase, we can introduce a new text for the basic LLM on which it will be trained. We call this process fine-tuning and it is often used to further improve the LLM's performance on a particular task. For example, you might want to use LLM to generate content for your Twitter account. You can provide the model with a few examples of your previous tweets to get an idea of the desired result.

Applications in Various Industries (Media, E-commerce, EdTech)

As search engines better understand the meaning of queries and people discover the benefit of AI-powered search engines, some industries can highly improve their position on the market. 

The media websites are thriving today – from video news, to how-to tutorials – their content is ever-growing and changing. They can utilize LLMs for intelligent content creation and curation, like engaging headlines, compelling stories, and improved content quality. This enables media companies to streamline their content production process. They can also apply LLM for target advertising and content monetization.

Online buyers (e-commerce) usually know what they want to buy, but LLM can be used to answer customer inquiries about products, services, delivery etc. Also, LLMs can lead the customer in product selection by showcasing particular items, and helping with the checkout process. LLMs can also detect frauds, such as identity thefts and credit card frauds. 

Universities and EdTech providers can benefit from LLMs by providing personalized learning content. They can also create specific exercises and activities (quizzes, problem tasks) for each individual student. LLMs can translate textbooks in multiple languages, so the students will have the material in their native language. Finally LLMs can be used in grading students' works (tests, essays) – saving time for teachers, so they can dedicate on more creative work.

Impact on Search Relevance and Personalization

Site search functionality is of a crucial importance for multiple businesses, so it should comprehend users’ input accurately and provide relevant search results. LLMs can be used in multiple search relevance and personalization tasks:

  • Extracting summaries and keywords that accurately represent the website content, thus helping search engines to provide more relevant and useful search results to users.
  • Detecting and fixing misspelt words in a query and enhancing overall text input expansion. 
  • Search embedding to interpret the semantic meaning of the query and combine it with an LLM. This will return to the user - relevant sentences from shortlisted documents. 
  • Query tokenization and expansion - breaking down the query into individual units, known as tokens, by splitting them by white spaces. Then referring to query expansion cache to identify potential spelling corrections and abbreviation expansions related to the tokens.
  • Query understanding – text-to-SQL translation and querying a database.
  • Index construction – summarizing the index, extracting structured data, text chunking, managing document updates.
  • Online dialog systems - LLMs can be used to generate responses that are appropriate, coherent, and natural-sounding in a given context.
  • Forward-Looking Active REtrieval and augmented generation (FLARE) - sampling a potential next search result, multiplying the probabilities of each word, and if it is below a certain threshold, triggering a search query with the previous context.


Large language models (LLMs) are becoming ubiquitous across multiple sectors, from e-commerce and media - to education and finance. They are highly capable for content creation, personalization, optimization, market research, and competitor analysis. LLMs will greatly affect business and technology since they can understand and create content that sounds human. LLMs can be applied in different ways to help businesses gain valuable insights and make better decisions.

Here you can check out how to adopt Omnisearch and to implement the whole site search over text, PDFs, audio and video content. To kick-off the implementation of your own AI-based site search, you can start by contacting us for a quick demo.

Subscribe to our newsletter

Sign up to our newsletter and receive the latest updates!