Boosting SEO with AI: Enhancing Images

Do you have a sizable website with hundreds of images lacking SEO-friendly alt/title attributes? Discover how AI can remedy this. 🌐

The Importance of Image Alt/Titles in SEO

When it comes to websites with images, having alt and title attributes in HTML isn't just good practice – it's essential. There are numerous articles explaining why, like HubSpot’s Article on Image SEO and Semrush’s Article on Image SEO.

If you have just a few images, you can (should?) just edit those manually and specify those attributes. But what about platforms hosting hundreds? Or those with user-contributed images? Usually we try to simplify image uploads for users and we remove the extra text-fields - so the images end up without any attributes. Here’s where AI becomes your ally. 💡

How our AI-based pipeline works, with salesforce and langchai (openai)

The AI Solution: Image-to-Text Models and LangChain

Our setup's centerpiece is the Image-to-Text Salesforce Model (there are also other alternative models available on Hugging Face). This model interprets image contents into text descriptions. It works pretty well with various sizes and image qualities - like some sort of black magic!

Alright, that’s a good start, but we can enhance it further. We'll employ an AI (LLM) to refine these descriptions into SEO-optimized captions, using specific prompts and context. 🚀

The general overview diagram - explaining how to improve images from SEO perspective with AI

Data Extraction and Pipeline Setup

Before diving into the pipeline, we need to extract data, depending on your website's backend. It could range from SQL queries to PHP functions for CSV output. Categorizing images (like user contributions or article covers) helps tailor AI context, so when you export those, make sure to have a column that specifies where that image comes from - a category (like: user contributed image, or, article cover image).

With a consistent CSV export method, let's look at the pipeline. I've used a docker compose template for a containerized setup, ideal for future microservice deployment and CI/CD integration. 🐳

What's the docker-compose containing - diagram explaining this

Development Tips:

For local development, if you want to install all dependencies on the host system, use: uvicorn main:app --reload - to start the FastAPI.
For dependency-free local development: utilize Docker's virtual environment - you can reuse it on your host (assuming you’re having the same python version). Here, you’ll have to make sure that the python’s paths on the host system match the same paths on the containerized system. In my case, I had to create a symlink (on the host): /usr/local/bin/python3.10
If using CUDA locally, map your current CUDA version volumes to save resources.

FastAPI and LangChain Integration

FastAPI serves as our microservice foundation, complemented by LangChain for LLM flexibility beyond OpenAI models (you can later pair it with any other LLM, even open source ones). This combination allows us to craft a powerful, scalable solution. 🛠️

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import requests
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import torch
import time
app = FastAPI()
 
 
class ImageRequest(BaseModel):
    context: str
    image_url: str
    id: int
 
 
# Initialize Hugging Face model and processor with GPU support
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
 
# Check if CUDA is available and move the model to GPU
if torch.cuda.is_available():
    model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large").to("cuda")
else:
    model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")
    print("CUDA not available, using CPU instead.")
 
 
@app.post("/process-image")
async def process_image(request: ImageRequest):
    try:
        # Download image from URL
        try:
            raw_image = Image.open(requests.get(request.image_url, stream=True).raw).convert('RGB')
        except Exception as e:
            raise HTTPException(status_code=500, detail=str(e))
 
        # Process and generate caption
        if torch.cuda.is_available():
            inputs = processor(raw_image, text="a photo of", return_tensors="pt").to("cuda")
        else:
            inputs = processor(raw_image, text="a photo of", return_tensors="pt")
            print("CUDA not available, using CPU instead.")
 
        outputs = model.generate(**inputs, max_new_tokens=100)
        caption = processor.decode(outputs[0], skip_special_tokens=True)
 
        print("AI caption: " + caption)
 
        # @TODO: Add LangChain here.
 
        return {"id": request.id, "caption": caption}
    except Exception as e:
        print("Exception during optimization: " + str(e))
        raise HTTPException(status_code=500, detail=str(e))

Essential Packages (pyproject.toml - using Poetry):

[tool.poetry]
name = "boilerplate"
version = "0.1.0"
description = "boilerplate-app"
authors = ["Nikro"]
 
[tool.poetry.dependencies]
python = "^3.10"
fastapi = "^0.104.1"
uvicorn = "^0.24.0"
transformers = "^4.35.2"
torch = "^2.1.1"
requests = "^2.31.0"
Pillow = "^10.1.0"
langchain = "^0.0.339"
openai = "^1.3.4"
 
[tool.poetry.dev-dependencies]
 
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

Docker Compose and Dockerfile:

✅ Docker-compose.yml:

version: "3"
 
services:
  # Node container (for Husky, linting, etc).
  node:
    image: node:latest
    restart: unless-stopped
    working_dir: /usr/src/app
    command: ["tail", "-f", "/dev/null"]
    volumes:
      - /usr/bin/git:/usr/bin/git:ro
      - ~/.gitconfig:/home/node/.gitconfig:ro
      - .:/usr/src/app
 
  # App we want to build using our /app/Dockerfile.
  app:
    build:
      context: ./app
      dockerfile: Dockerfile
    restart: unless-stopped
    working_dir: /usr/src/app
    ports:
      - "8000:8000"
    env_file:
      - .env # OpenAI key lives here.
    networks:
      - external
    volumes:
      - ./app:/usr/src/app
      - /usr/local/cuda:/usr/local/cuda # Map CUDA to the container.
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all # You need this to use the GPU.
 
# There are 2 networks: external and internal.
networks:
  external:
    driver: bridge
  internal:
    driver: bridge

✅ Dockerfile:

# Use the official lightweight Python image.
FROM python:3.10-slim
 
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
ENV POETRY_VERSION 1.1.13
ENV PYTHONPATH /usr/src/app
 
# Install system dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends build-essential gcc g++ curl && \
    rm -rf /var/lib/apt/lists/*
 
# Install Poetry
RUN curl -sSL https://install.python-poetry.org | python3 -
 
# Add Poetry to PATH in .bashrc for interactive sessions
RUN echo "export PATH=\"/root/.local/bin:$PATH\"" >> /root/.bashrc
 
# Set the PATH for subsequent Docker layers
ENV PATH "/root/.local/bin:${PATH}"
 
RUN poetry --version
 
# Set the working directory inside the container
WORKDIR /usr/src/app
 
# Copy the pyproject.toml file (and optionally poetry.lock) into the container
COPY pyproject.toml poetry.lock* ./
 
# Install project dependencies
RUN poetry config virtualenvs.in-project true \
    && poetry install --no-interaction --no-ansi
 
# Add the virtual environment bin directory to PATH
ENV PATH="/usr/src/app/.venv/bin:$PATH"
 
# Copy the rest of your app's source code from your host to your image filesystem.
COPY . .
 
# Run the command to start your application
EXPOSE 8000
CMD ["uvicorn", "main:app", "--reload", "--host", "0.0.0.0"]
ENTRYPOINT ["/usr/src/app/entrypoint.sh"]

Run `docker compose up` and test the setup at: http://localhost:8000/docs

Crafting the Caption: Generating Alt/Title Tags

With the image recognition part up and running, let's explore LangChain to refine our captions. Using ChatGPT3.5, we can perfect our SEO-friendly captions with this prompt template:

✅ Prompt Template:

"""
You are an advanced AI specializing in creating SEO-optimized image captions for use as alt text or title text. Your captions must follow SEO best practices, focusing on concise, descriptive language that includes relevant keywords. You will be given a context describing where the image was used or uploaded and an initial caption generated by an image captioning tool. It is your job to refine this caption to make it more relevant and SEO-friendly, correcting any contextual mistakes made by the initial tool.

Context: {context}

Initial Image Caption: {caption}

Refined SEO-optimized Image Caption:
"""

Updated to work seamlessly with LangChain, our snippet now looks like this:

✅ LangChain Snippet (same function, but this time with LangChain):

@app.post("/process-image")
async def process_image(request: ImageRequest):
    try:
        # Download image from URL
        try:
            raw_image = Image.open(requests.get(request.image_url, stream=True).raw).convert('RGB')
        except Exception as e:
            raise HTTPException(status_code=500, detail=str(e))
 
        # Process and generate caption
        if torch.cuda.is_available():
            inputs = processor(raw_image, text="a photo of", return_tensors="pt").to("cuda")
        else:
            inputs = processor(raw_image, text="a photo of", return_tensors="pt")
            print("CUDA not available, using CPU instead.")
 
        outputs = model.generate(**inputs, max_new_tokens=100)
        caption = processor.decode(outputs[0], skip_special_tokens=True)
 
        print("AI caption: " + caption)
 
        # Now let's process it with langchain.
        llm = ChatOpenAI(temperature=0.2, model="gpt-3.5-turbo-1106", request_timeout=20)
        prompt = PromptTemplate.from_template(
            "You are an advanced AI specializing in creating SEO-optimized image captions for use as alt text or title "
            "text. Your captions must follow SEO best practices, focusing on concise, descriptive language that "
            "includes relevant keywords. You will be given a context describing where the image was used or uploaded "
            "and an initial caption generated by an image captioning tool. It is your job to refine this caption to "
            "make it more relevant and SEO-friendly, correcting any contextual mistakes made by the initial tool. "
            "\n\n"
            "Context: {context}\n\n"
            "Initial Image Caption: {caption}\n\n"
            "Refined SEO-optimized Image Caption:"
        )
 
        try:
            print("Optimizing caption...")
            handler = StdOutCallbackHandler()
            llm_chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler])
            caption_optimized = llm_chain.run({"context": request.context, "caption": caption})
            print("Result: " + caption_optimized)
 
            # Set delay of 3 seconds between each prompt - because OpenAI sometimes does NOT reply.
            time.sleep(3)
        except Exception as e:
            print("Exception during optimization: " + str(e))
            raise HTTPException(status_code=500, detail=str(e))
 
        return {"id": request.id, "caption": caption_optimized}

Test it out on our setup page and see the magic happen! ✨

New reply is:

{
  "id": 0,
  "caption": "festive Christmas tree in a cozy room with a view of the outdoors"
}

Implementing the API for Real-world Use

Our API is live and ready at https://localhost:8000. Let’s put it to work with a Python script that updates captions in our CSV, one by one. I suggest starting with a small batch of 10-20 images to ensure everything is running smoothly.

Here’s an example of the script.

Example Python Script (with LangChain):

import csv
import requests
 
def generate_context(row):
    # This is the category-based context.
    if row['IMG_TYPE'] == 'field_contributed_images':
        return (f"This image was uploaded by a user, as a comment in an article called {row['Node Title']}. As this is"
                f"a contributed image, probably it's a complaint or a meme. Be aware of it.")
    else:
        return "No context - try and figure this one out on your own."
 
 
def process_and_update_row(row):
    context = generate_context(row)
    payload = {
        "context": context,
        "image_url": row['File URL'],
        "id": row['File ID']
    }
 
    try:
        response = requests.post('http://localhost:8000/process-image', json=payload)
        if response.status_code == 200:
            row['CAPTION_NEW'] = response.json().get('caption')
            print(f"Processed File ID {row['File ID']}: Caption generated.")
        else:
            row['CAPTION_NEW'] = "Error generating caption."
            print(f"Error for File ID {row['File ID']}: {response.text}")
    except Exception as e:
        print(f"Exception for File ID {row['File ID']}: {str(e)}")
        row['CAPTION_NEW'] = "Exception during processing."
 
    return row
 
 
def process_csv(file_path, limit=50):
    processed_count = 0
    with open(file_path, mode='r', encoding='utf-8') as file:
        rows = list(csv.DictReader(file))
 
    for row in rows:
        if processed_count >= limit or row['CAPTION_NEW']:
            continue
 
        updated_row = process_and_update_row(row)
        rows[rows.index(row)] = updated_row
        processed_count += 1
 
        # Write the updated row back to the CSV file
        with open(file_path, mode='w', newline='', encoding='utf-8') as file:
            writer = csv.DictWriter(file, fieldnames=rows[0].keys())
            writer.writeheader()
            writer.writerows(rows)
 
 
csv_file_path = 'file.csv'
limit = 10  # Adjust the limit as needed - remember, 1st time do around 10 to test the results.
process_csv(csv_file_path, limit=limit)

If adjustments are needed, tweak the main prompt or contexts for each image category until you're satisfied with the outcomes. 🔄

The Final Step: Updating Your System

After perfecting your captions, it's time for the final act – updating your website's images with these new, AI-generated alt/title tags. Run an update script within your system to apply these changes, ensuring your images are now SEO-friendly and more accessible. 🌍

Embracing AI in SEO and Web Development

The world of open-source models offers a plethora of solutions for tasks you'd think were impossible, and in record time too. Hugging Face is an excellent resource to explore these models. Using LangChain combined with LLMs creates a solid, efficient workflow. Plus, containerizing your setup as a microservice using templates not only accelerates development but also keeps things organized and scalable. 🚀

Other things to consider to enhance in the process (not covered in this article):

Consider using the same pipeline to rename the images (from 123.png to something like christmas-tree.png, also make sure you don't have duplicates), suggested by Andrian Valeanu;
Consider using a length limit for those captions - suggested by Andrian Valeanu as well;
You can also rewrite the images (a bit more complex) - so that they incorporate the captions and authoring (maybe?) as metadata of the image - embedded straight into the image - however this requires you to re-save all the images - also suggested by Andrian Valeanu 😉

I'm committed to creating more articles about how developers and entrepreneurs like you can leverage these Open Source models in practical, hands-on scenarios. If you find these articles useful, consider supporting my work on Patreon. Every bit of support is greatly appreciated! 💖

Comments:

Feel free to ask any question / or share any suggestion!

Boosting SEO with AI: Enhancing Images

The Importance of Image Alt/Titles in SEO

The AI Solution: Image-to-Text Models and LangChain

Data Extraction and Pipeline Setup

FastAPI and LangChain Integration

Crafting the Caption: Generating Alt/Title Tags

Implementing the API for Real-world Use

The Final Step: Updating Your System

Embracing AI in SEO and Web Development

Tags:

Categories:

Comments:

The Importance of Image Alt/Titles in SEO

The AI Solution: Image-to-Text Models and LangChain

Data Extraction and Pipeline Setup

FastAPI and LangChain Integration

Crafting the Caption: Generating Alt/Title Tags

Implementing the API for Real-world Use

The Final Step: Updating Your System

Embracing AI in SEO and Web Development

Tags:

Categories:

Comments:

Related Articles:

Boosting Productivity: Building an AI Copilot

Crafting Our AI Assistant