Boosting SEO with AI: Enhancing Images
Do you have a sizable website with hundreds of images lacking SEO-friendly alt/title attributes? Discover how AI can remedy this. 🌐
The Importance of Image Alt/Titles in SEO
When it comes to websites with images, having alt and title attributes in HTML isn't just good practice – it's essential. There are numerous articles explaining why, like HubSpot’s Article on Image SEO and Semrush’s Article on Image SEO.
If you have just a few images, you can (should?) just edit those manually and specify those attributes. But what about platforms hosting hundreds? Or those with user-contributed images? Usually we try to simplify image uploads for users and we remove the extra text-fields - so the images end up without any attributes. Here’s where AI becomes your ally. 💡
The AI Solution: Image-to-Text Models and LangChain
Our setup's centerpiece is the Image-to-Text Salesforce Model (there are also other alternative models available on Hugging Face). This model interprets image contents into text descriptions. It works pretty well with various sizes and image qualities - like some sort of black magic!
Alright, that’s a good start, but we can enhance it further. We'll employ an AI (LLM) to refine these descriptions into SEO-optimized captions, using specific prompts and context. 🚀
Data Extraction and Pipeline Setup
Before diving into the pipeline, we need to extract data, depending on your website's backend. It could range from SQL queries to PHP functions for CSV output. Categorizing images (like user contributions or article covers) helps tailor AI context, so when you export those, make sure to have a column that specifies where that image comes from - a category (like: user contributed image, or, article cover image).
With a consistent CSV export method, let's look at the pipeline. I've used a docker compose template for a containerized setup, ideal for future microservice deployment and CI/CD integration. 🐳
Development Tips:
- For local development, if you want to install all dependencies on the host system, use: uvicorn main:app --reload - to start the FastAPI.
- For dependency-free local development: utilize Docker's virtual environment - you can reuse it on your host (assuming you’re having the same python version). Here, you’ll have to make sure that the python’s paths on the host system match the same paths on the containerized system. In my case, I had to create a symlink (on the host): /usr/local/bin/python3.10
- If using CUDA locally, map your current CUDA version volumes to save resources.
FastAPI and LangChain Integration
FastAPI serves as our microservice foundation, complemented by LangChain for LLM flexibility beyond OpenAI models (you can later pair it with any other LLM, even open source ones). This combination allows us to craft a powerful, scalable solution. 🛠️
from fastapi import FastAPI, HTTPException from pydantic import BaseModel import requests from transformers import BlipProcessor, BlipForConditionalGeneration from PIL import Image import torch import time app = FastAPI() class ImageRequest(BaseModel): context: str image_url: str id: int # Initialize Hugging Face model and processor with GPU support processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large") # Check if CUDA is available and move the model to GPU if torch.cuda.is_available(): model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large").to("cuda") else: model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large") print("CUDA not available, using CPU instead.") @app.post("/process-image") async def process_image(request: ImageRequest): try: # Download image from URL try: raw_image = Image.open(requests.get(request.image_url, stream=True).raw).convert('RGB') except Exception as e: raise HTTPException(status_code=500, detail=str(e)) # Process and generate caption if torch.cuda.is_available(): inputs = processor(raw_image, text="a photo of", return_tensors="pt").to("cuda") else: inputs = processor(raw_image, text="a photo of", return_tensors="pt") print("CUDA not available, using CPU instead.") outputs = model.generate(**inputs, max_new_tokens=100) caption = processor.decode(outputs[0], skip_special_tokens=True) print("AI caption: " + caption) # @TODO: Add LangChain here. return {"id": request.id, "caption": caption} except Exception as e: print("Exception during optimization: " + str(e)) raise HTTPException(status_code=500, detail=str(e))
Essential Packages (pyproject.toml - using Poetry):
[tool.poetry] name = "boilerplate" version = "0.1.0" description = "boilerplate-app" authors = ["Nikro"] [tool.poetry.dependencies] python = "^3.10" fastapi = "^0.104.1" uvicorn = "^0.24.0" transformers = "^4.35.2" torch = "^2.1.1" requests = "^2.31.0" Pillow = "^10.1.0" langchain = "^0.0.339" openai = "^1.3.4" [tool.poetry.dev-dependencies] [build-system] requires = ["poetry-core>=1.0.0"] build-backend = "poetry.core.masonry.api"
Docker Compose and Dockerfile:
✅ Docker-compose.yml:
version: "3" services: # Node container (for Husky, linting, etc). node: image: node:latest restart: unless-stopped working_dir: /usr/src/app command: ["tail", "-f", "/dev/null"] volumes: - /usr/bin/git:/usr/bin/git:ro - ~/.gitconfig:/home/node/.gitconfig:ro - .:/usr/src/app # App we want to build using our /app/Dockerfile. app: build: context: ./app dockerfile: Dockerfile restart: unless-stopped working_dir: /usr/src/app ports: - "8000:8000" env_file: - .env # OpenAI key lives here. networks: - external volumes: - ./app:/usr/src/app - /usr/local/cuda:/usr/local/cuda # Map CUDA to the container. runtime: nvidia environment: - NVIDIA_VISIBLE_DEVICES=all # You need this to use the GPU. # There are 2 networks: external and internal. networks: external: driver: bridge internal: driver: bridge
✅ Dockerfile:
# Use the official lightweight Python image. FROM python:3.10-slim # Set environment variables ENV PYTHONDONTWRITEBYTECODE 1 ENV PYTHONUNBUFFERED 1 ENV POETRY_VERSION 1.1.13 ENV PYTHONPATH /usr/src/app # Install system dependencies RUN apt-get update && \ apt-get install -y --no-install-recommends build-essential gcc g++ curl && \ rm -rf /var/lib/apt/lists/* # Install Poetry RUN curl -sSL https://install.python-poetry.org | python3 - # Add Poetry to PATH in .bashrc for interactive sessions RUN echo "export PATH=\"/root/.local/bin:$PATH\"" >> /root/.bashrc # Set the PATH for subsequent Docker layers ENV PATH "/root/.local/bin:${PATH}" RUN poetry --version # Set the working directory inside the container WORKDIR /usr/src/app # Copy the pyproject.toml file (and optionally poetry.lock) into the container COPY pyproject.toml poetry.lock* ./ # Install project dependencies RUN poetry config virtualenvs.in-project true \ && poetry install --no-interaction --no-ansi # Add the virtual environment bin directory to PATH ENV PATH="/usr/src/app/.venv/bin:$PATH" # Copy the rest of your app's source code from your host to your image filesystem. COPY . . # Run the command to start your application EXPOSE 8000 CMD ["uvicorn", "main:app", "--reload", "--host", "0.0.0.0"] ENTRYPOINT ["/usr/src/app/entrypoint.sh"]
Run `docker compose up` and test the setup at: http://localhost:8000/docs
Crafting the Caption: Generating Alt/Title Tags
With the image recognition part up and running, let's explore LangChain to refine our captions. Using ChatGPT3.5, we can perfect our SEO-friendly captions with this prompt template:
✅ Prompt Template:
"""
You are an advanced AI specializing in creating SEO-optimized image captions for use as alt text or title text. Your captions must follow SEO best practices, focusing on concise, descriptive language that includes relevant keywords. You will be given a context describing where the image was used or uploaded and an initial caption generated by an image captioning tool. It is your job to refine this caption to make it more relevant and SEO-friendly, correcting any contextual mistakes made by the initial tool.
Context: {context}
Initial Image Caption: {caption}
Refined SEO-optimized Image Caption:
"""
Updated to work seamlessly with LangChain, our snippet now looks like this:
✅ LangChain Snippet (same function, but this time with LangChain):
@app.post("/process-image") async def process_image(request: ImageRequest): try: # Download image from URL try: raw_image = Image.open(requests.get(request.image_url, stream=True).raw).convert('RGB') except Exception as e: raise HTTPException(status_code=500, detail=str(e)) # Process and generate caption if torch.cuda.is_available(): inputs = processor(raw_image, text="a photo of", return_tensors="pt").to("cuda") else: inputs = processor(raw_image, text="a photo of", return_tensors="pt") print("CUDA not available, using CPU instead.") outputs = model.generate(**inputs, max_new_tokens=100) caption = processor.decode(outputs[0], skip_special_tokens=True) print("AI caption: " + caption) # Now let's process it with langchain. llm = ChatOpenAI(temperature=0.2, model="gpt-3.5-turbo-1106", request_timeout=20) prompt = PromptTemplate.from_template( "You are an advanced AI specializing in creating SEO-optimized image captions for use as alt text or title " "text. Your captions must follow SEO best practices, focusing on concise, descriptive language that " "includes relevant keywords. You will be given a context describing where the image was used or uploaded " "and an initial caption generated by an image captioning tool. It is your job to refine this caption to " "make it more relevant and SEO-friendly, correcting any contextual mistakes made by the initial tool. " "\n\n" "Context: {context}\n\n" "Initial Image Caption: {caption}\n\n" "Refined SEO-optimized Image Caption:" ) try: print("Optimizing caption...") handler = StdOutCallbackHandler() llm_chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler]) caption_optimized = llm_chain.run({"context": request.context, "caption": caption}) print("Result: " + caption_optimized) # Set delay of 3 seconds between each prompt - because OpenAI sometimes does NOT reply. time.sleep(3) except Exception as e: print("Exception during optimization: " + str(e)) raise HTTPException(status_code=500, detail=str(e)) return {"id": request.id, "caption": caption_optimized}
Test it out on our setup page and see the magic happen! ✨
New reply is:
{ "id": 0, "caption": "festive Christmas tree in a cozy room with a view of the outdoors" }
Implementing the API for Real-world Use
Our API is live and ready at https://localhost:8000. Let’s put it to work with a Python script that updates captions in our CSV, one by one. I suggest starting with a small batch of 10-20 images to ensure everything is running smoothly.
Here’s an example of the script.
Example Python Script (with LangChain):
import csv import requests def generate_context(row): # This is the category-based context. if row['IMG_TYPE'] == 'field_contributed_images': return (f"This image was uploaded by a user, as a comment in an article called {row['Node Title']}. As this is" f"a contributed image, probably it's a complaint or a meme. Be aware of it.") else: return "No context - try and figure this one out on your own." def process_and_update_row(row): context = generate_context(row) payload = { "context": context, "image_url": row['File URL'], "id": row['File ID'] } try: response = requests.post('http://localhost:8000/process-image', json=payload) if response.status_code == 200: row['CAPTION_NEW'] = response.json().get('caption') print(f"Processed File ID {row['File ID']}: Caption generated.") else: row['CAPTION_NEW'] = "Error generating caption." print(f"Error for File ID {row['File ID']}: {response.text}") except Exception as e: print(f"Exception for File ID {row['File ID']}: {str(e)}") row['CAPTION_NEW'] = "Exception during processing." return row def process_csv(file_path, limit=50): processed_count = 0 with open(file_path, mode='r', encoding='utf-8') as file: rows = list(csv.DictReader(file)) for row in rows: if processed_count >= limit or row['CAPTION_NEW']: continue updated_row = process_and_update_row(row) rows[rows.index(row)] = updated_row processed_count += 1 # Write the updated row back to the CSV file with open(file_path, mode='w', newline='', encoding='utf-8') as file: writer = csv.DictWriter(file, fieldnames=rows[0].keys()) writer.writeheader() writer.writerows(rows) csv_file_path = 'file.csv' limit = 10 # Adjust the limit as needed - remember, 1st time do around 10 to test the results. process_csv(csv_file_path, limit=limit)
If adjustments are needed, tweak the main prompt or contexts for each image category until you're satisfied with the outcomes. 🔄
The Final Step: Updating Your System
After perfecting your captions, it's time for the final act – updating your website's images with these new, AI-generated alt/title tags. Run an update script within your system to apply these changes, ensuring your images are now SEO-friendly and more accessible. 🌍
Embracing AI in SEO and Web Development
The world of open-source models offers a plethora of solutions for tasks you'd think were impossible, and in record time too. Hugging Face is an excellent resource to explore these models. Using LangChain combined with LLMs creates a solid, efficient workflow. Plus, containerizing your setup as a microservice using templates not only accelerates development but also keeps things organized and scalable. 🚀
Other things to consider to enhance in the process (not covered in this article):
- Consider using the same pipeline to rename the images (from 123.png to something like christmas-tree.png, also make sure you don't have duplicates), suggested by Andrian Valeanu;
- Consider using a length limit for those captions - suggested by Andrian Valeanu as well;
- You can also rewrite the images (a bit more complex) - so that they incorporate the captions and authoring (maybe?) as metadata of the image - embedded straight into the image - however this requires you to re-save all the images - also suggested by Andrian Valeanu 😉
I'm committed to creating more articles about how developers and entrepreneurs like you can leverage these Open Source models in practical, hands-on scenarios. If you find these articles useful, consider supporting my work on Patreon. Every bit of support is greatly appreciated! 💖
Comments:
Feel free to ask any question / or share any suggestion!