Building a Quranic Text Retrieval System with Generative AI

May 26, 2024

In this blog post, I will walk you through my recent experiment involving the creation of a Quranic text retrieval system using generative AI. This project leverages various technologies, including Gradio for the web interface, Qdrant for the vector database, and LM Studio as the local LLM server.

Project Overview

The goal of this experiment was to build a system that can store and retrieve Quranic verses based on semantic similarity. The core components of this system include:

Data Loading and Preprocessing: Loading Quranic verses and their translations from a JSON file.
Vectorization: Converting text into vector embeddings using a pre-trained model.
Storage: Storing these embeddings in a vector database (Qdrant).
Retrieval: Retrieving the most semantically similar verses based on user queries.
Interactive Interface: Using Gradio to create an interactive interface for vectorizing data and querying the database.

Step-by-Step Code Explanation

Let's dive into the code step-by-step. You can copy and paste the code blocks into a Jupyter Notebook to follow along.

1. Importing Required Libraries


import json
import gradio as gr
import ray
import uuid
import warnings
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance
from openai import OpenAI

warnings.simplefilter(action='ignore', category=FutureWarning)

Here, we import all the necessary libraries. Ray is used for parallel processing, SentenceTransformer for generating text embeddings, and QdrantClient for interacting with the Qdrant vector database.

2. Initializing Ray


ray.init()

Ray is initialized to enable parallel processing of tasks, which is particularly useful for computing embeddings.

3. Computing Embeddings


@ray.remote
def compute_embeddings(text, model_name):
    model = SentenceTransformer(model_name)
    return model.encode(text)

This function computes the embeddings for a given text using a specified model. By using Ray, we can run this function in parallel across multiple cores. This is especially beneficial when calculating embeddings for large datasets to be inserted into a vector database. In our case, we create embeddings for all the Quranic verses.

4. Processing and Storing Verses


def process_and_store_verses(json_file_path, model_name):
    try:
        with open(json_file_path, 'r', encoding='utf-8') as file:
            data = json.load(file)
    except Exception as e:
        return f"Error reading JSON file: {e}"

    url = "http://localhost:6333"
    client = QdrantClient(url)
    collection_name = "quran_collection"

    try:
        client.get_collection(collection_name)
    except Exception:
        client.create_collection(
            collection_name=collection_name,
            vectors_config=VectorParams(size=384, distance=Distance.COSINE)
        )

    for surah in data:
        for verse in surah['verses']:
            doc = {
                "id": str(uuid.uuid4()),
                "payload": {
                    "surah_id": surah['id'],
                    "surah_name": surah['name'],
                    "surah_transliteration": surah['transliteration'],
                    "surah_type": surah['type'],
                    "verse_id": verse['id'],
                    "text": verse['text'],
                    "translation": verse['translation']
                },
                "vectorizing_text": f"Surah {surah['transliteration']} ({surah['name']}), verse number {verse['id']}: {verse['text']}, meaning '{verse['translation']}'"
            }
            embedding = ray.get(compute_embeddings.remote(doc["vectorizing_text"], model_name))
            client.upsert(
                collection_name=collection_name,
                points=[
                    {
                        "id": doc["id"],
                        "vector": embedding,
                        "payload": doc["payload"]
                    }
                ]
            )
    return "Processed and stored the vector database successfully!"

In this function, we use Qdrant as a vector database. You can download its Docker image from its website and run it locally. Without Qdrant pre-installed, the code won't work. We transform the original JSON structure to a flatter structure for easy referencing of Quranic verses and store it in the payload. You can find the original Quranic JSON files and structure at this GitHub repository.

5. Searching the Vector Database


def search_qdrant(query_text, model_name):
    query_vector = ray.get(compute_embeddings.remote(query_text, model_name))

    url = "http://localhost:6333"
    client = QdrantClient(url)
    collection_name = "quran_collection"

    search_result = client.search(
        collection_name=collection_name,
        query_vector=query_vector,
        limit=5
    )
    results = []
    for result in search_result:
        results.append({
            "Score": result.score,
            "Result": f"Surah {result.payload['surah_transliteration']} ({result.payload['surah_name']}), verse number {result.payload['verse_id']}: {result.payload['text']}, meaning '{result.payload['translation']}'"
        })
    return results

Although we store a full set of the Quranic verse in JSON, the return result for each query is a single string which will be fed as input to the LLM.

6. Creating the Gradio Interface


def gradio_vectorize_and_store(json_file):
    model_name = "sentence-transformers/all-MiniLM-L6-v2"
    return process_and_store_verses(json_file.name, model_name)

def gradio_query(message, chat_history):
    model_name = "sentence-transformers/all-MiniLM-L6-v2"

    search_results = search_qdrant(message, model_name)

    history = [
        {"role": "system", "content": "Based on the top 5 semantically closest search results from the vector database, provide detailed and respectful answers to questions about Quranic texts and their meanings. Use these assistant prompts to inform your responses and ensure they are accurate, informative, and culturally sensitive."},
    ]
    for result in search_results:
        history.append({"role": "assistant", "content": result["Result"]})

    history.append({"role": "user", "content": message})

    lm_client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
    completion = lm_client.chat.completions.create(
        model="lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF",
        messages=history,
        temperature=0.7,
        stream=True,
    )

    bot_message = ""
    for chunk in completion:
        if chunk.choices[0].delta.content:
            bot_message += chunk.choices[0].delta.content

    chat_history.append((message, bot_message))
    return "", chat_history

with gr.Blocks() as app:
    with gr.Tab("Vectorize and Store"):
        gr.Markdown("## Upload JSON File to Vectorize and Store")
        json_file_input = gr.File(label="Upload JSON File", type="filepath")
        vectorize_button = gr.Button("Vectorize and Store")
        vectorize_output = gr.Textbox(label="Status")

        vectorize_button.click(gradio_vectorize_and_store, inputs=json_file_input, outputs=vectorize_output)

    with gr.Tab("Quran Chat"):
        gr.Markdown("## Message to Quran Chat")
        chatbot = gr.Chatbot()
        msg = gr.Textbox()
        clear = gr.ClearButton([msg, chatbot])

        msg.submit(gradio_query, [msg, chatbot], [msg, chatbot])

app.launch()

In this step, we set up a user-friendly interface using Gradio. We pass the model name "sentence-transformers/all-MiniLM-L6-v2" to the SentenceTransformer, which generates embeddings for the text we supply. It is important to use the same model for both vectorizing and storing the text and for querying to maintain consistency. While we used the "sentence-transformers/all-MiniLM-L6-v2" model in our experiment, you can choose a different model based on your requirements for multilingual support, speed, efficiency, and semantic accuracy.

Additionally, we use LM Studio, installed locally, as our LLM server. The LLM model we use is "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF". Again, you can select a different model based on your needs.

We intentionally define a system prompt to direct the LLM to use five assistant prompts provided by the vector database query results and derive a response based on those. The five returned results from the vector database query are used as assistant prompts, and along with the user's query, they are fed to the LLM. This setup helps the LLM generate more accurate and contextually relevant responses.

Running the Application

Once you run all the code, the Gradio web interface will show up. Here are a few tips:

Loading the JSON File: You need to load a JSON file first and only once. Vectorizing and storing the data may take hours, depending on your hardware power. I suggest running this process overnight and continuing the next day.
Handling Data Issues: If you encounter any data issues requiring you to delete the database, you can do this by accessing the Qdrant Dashboard (please refer to the documentation on their website).
Using the Chat Interface: Once the JSON file is fully loaded, you can go to the Quran Chat tab and start playing with your queries.

Conclusion

In this experiment, I successfully built a Quranic text retrieval system that uses generative AI for vectorization and Qdrant for efficient storage and retrieval. This system allows users to upload a JSON file of Quranic verses, vectorize the texts, and query the database to find semantically similar verses. The use of Gradio provides a user-friendly interface for interacting with the system.

I conducted a few experiments with the query functionality, testing both Arabic and English words. When searching with Arabic words, the semantic accuracy was poor. Searching with English words yielded better results but still did not meet my expectations entirely. For instance, when I copied the exact Arabic words of a verse from the Quran into the query, the top-score result returned from the vector database was unmatched. Conversely, when I did the same with English words, the top-score result was matched. However, when I used another translation of the same verse, the top-score result was unmatched, and in some experiments, none of the top 5 scores were matched.

These findings suggest that while the system performs better with English queries, there is room for improvement in both languages to achieve higher semantic accuracy.

e49.space