Optimizing Our Quranic Text Retrieval System with GPU Power
In one of my previous posts, we dived into building a Quranic Text Retrieval System using Generative AI. It was quite a journey, but we noticed that computing embeddings for our large dataset was a bit slow, even with Ray for parallelism. So, we went on a little adventure to find a better way. Spoiler alert: we found it! ð
After some research, we discovered that using a GPU to compute the embeddings could significantly speed things up. Here’s how we did it.
First, we initialized Ray and specified our available CPU and GPU resources:
# Initialize Ray, specifying GPU resources
ray.init(num_cpus=4, num_gpus=1) # Adjust based on your available GPUs
This line tells Ray we have 4 CPUs and 1 GPU to work with.
Next, we loaded the model and set up remote functions to use the GPU:
# Load the model once and use it in remote functions
model_name = 'paraphrase-multilingual-mpnet-base-v2'
model = SentenceTransformer(model_name)
@ray.remote(num_gpus=1)
class EmbeddingWorker:
def __init__(self, model_name):
self.model = SentenceTransformer(model_name)
def compute_embeddings(self, text):
return self.model.encode(text)
# Create a pool of workers
num_workers = 1 # Adjust based on your available GPUs
workers = [EmbeddingWorker.remote(model_name) for _ in range(num_workers)]
Here, we directed our Ray cluster to use the GPU for computing embeddings with SentenceTransformers.encode.
Now, let’s talk about the magic that really sped things up. We changed our approach from computing embeddings per verse to per surah. Check out the code below:
for surah in data:
docs = []
vectorizing_texts = []
for verse in surah['verses']:
doc = {
"surah_id": surah['id'],
"surah_name": surah['name'],
"surah_transliteration": surah['transliteration'],
"surah_type": surah['type'],
"verse_id": verse['id'],
"text": verse['text'],
"translation": verse['translation']
}
docs.append(doc)
vectorizing_texts.append(f"Surah {surah['transliteration']} ({surah['name']}), verse number {verse['id']}: {verse['text']}, meaning '{verse['translation']}'")
# Distribute the computation of embeddings across workers
futures = [workers[i % num_workers].compute_embeddings.remote(text) for i, text in enumerate(vectorizing_texts)]
embeddings = ray.get(futures)
By switching to per-surah embedding computation, we slashed the time from hours to just around 5 minutes! ð
That’s it for now! Stay tuned for more updates, and happy coding! ð
Comments
Post a Comment