Optimizing Our Quranic Text Retrieval System with GPU Power

In one of my previous posts, we dived into building a Quranic Text Retrieval System using Generative AI. It was quite a journey, but we noticed that computing embeddings for our large dataset was a bit slow, even with Ray for parallelism. So, we went on a little adventure to find a better way. Spoiler alert: we found it! 🎉

After some research, we discovered that using a GPU to compute the embeddings could significantly speed things up. Here’s how we did it.


First, we initialized Ray and specified our available CPU and GPU resources:


# Initialize Ray, specifying GPU resources
ray.init(num_cpus=4, num_gpus=1)  # Adjust based on your available GPUs

This line tells Ray we have 4 CPUs and 1 GPU to work with.

Next, we loaded the model and set up remote functions to use the GPU:


# Load the model once and use it in remote functions
model_name = 'paraphrase-multilingual-mpnet-base-v2'
model = SentenceTransformer(model_name)

@ray.remote(num_gpus=1)
class EmbeddingWorker:
    def __init__(self, model_name):
        self.model = SentenceTransformer(model_name)
    
    def compute_embeddings(self, text):
        return self.model.encode(text)

# Create a pool of workers
num_workers = 1  # Adjust based on your available GPUs
workers = [EmbeddingWorker.remote(model_name) for _ in range(num_workers)]

Here, we directed our Ray cluster to use the GPU for computing embeddings with SentenceTransformers.encode.

Now, let’s talk about the magic that really sped things up. We changed our approach from computing embeddings per verse to per surah. Check out the code below:


for surah in data:
    docs = []
    vectorizing_texts = []
    for verse in surah['verses']:
        doc = {
            "surah_id": surah['id'],
            "surah_name": surah['name'],
            "surah_transliteration": surah['transliteration'],
            "surah_type": surah['type'],
            "verse_id": verse['id'],
            "text": verse['text'],
            "translation": verse['translation']
        }
        docs.append(doc)
        vectorizing_texts.append(f"Surah {surah['transliteration']} ({surah['name']}), verse number {verse['id']}: {verse['text']}, meaning '{verse['translation']}'")
    
    # Distribute the computation of embeddings across workers
    futures = [workers[i % num_workers].compute_embeddings.remote(text) for i, text in enumerate(vectorizing_texts)]
    embeddings = ray.get(futures)

By switching to per-surah embedding computation, we slashed the time from hours to just around 5 minutes! 🚀

That’s it for now! Stay tuned for more updates, and happy coding! 😊

Comments

Popular posts from this blog

āļĢāļ–āđ„āļŸāļŸ้āļēāđ„āļ—āļĒ: āđ€āļāļĄāļāļēāļĢāđ€āļĄืāļ­āļ‡āļŦāļĢืāļ­āļ­āļ™āļēāļ„āļ•āļ—ี่āļĒั่āļ‡āļĒืāļ™?

āļŦāļ™ี้āļ„āļĢัāļ§āđ€āļĢืāļ­āļ™: āļ ัāļĒāđ€āļ‡ีāļĒāļš āļ—ี่āļāļģāļĨัāļ‡āļัāļ”āļิāļ™āđ€āļĻāļĢāļĐāļāļิāļˆāđ„āļ—āļĒ

Experimenting with Retrieval-Augmented Generation (RAG) for Thai Content