Boosting SEO with AI: Enhancing Images

Professional Article
November 29, 2023

Do you have a sizable website with hundreds of images lacking SEO-friendly alt/title attributes? Discover how AI can remedy this. 🌐

The Importance of Image Alt/Titles in SEO

When it comes to websites with images, having alt and title attributes in HTML isn't just good practice – it's essential. There are numerous articles explaining why, like HubSpot’s Article on Image SEO and Semrush’s Article on Image SEO

If you have just a few images, you can (should?) just edit those manually and specify those attributes. But what about platforms hosting hundreds? Or those with user-contributed images? Usually we try to simplify image uploads for users and we remove the extra text-fields - so the images end up without any attributes. Here’s where AI becomes your ally. 💡 

The AI Solution: Image-to-Text Models and LangChain

Our setup's centerpiece is the Image-to-Text Salesforce Model (there are also other alternative models available on Hugging Face). This model interprets image contents into text descriptions. It works pretty well with various sizes and image qualities - like some sort of black magic!

Alright, that’s a good start, but we can enhance it further. We'll employ an AI (LLM) to refine these descriptions into SEO-optimized captions, using specific prompts and context. 🚀 

Data Extraction and Pipeline Setup

Before diving into the pipeline, we need to extract data, depending on your website's backend. It could range from SQL queries to PHP functions for CSV output. Categorizing images (like user contributions or article covers) helps tailor AI context, so when you export those, make sure to have a column that specifies where that image comes from - a category (like: user contributed image, or, article cover image).

With a consistent CSV export method, let's look at the pipeline. I've used a docker compose template for a containerized setup, ideal for future microservice deployment and CI/CD integration. 🐳 

Development Tips:

  • For local development, if you want to install all dependencies on the host system, use: uvicorn main:app --reload - to start the FastAPI.
  • For dependency-free local development: utilize Docker's virtual environment - you can reuse it on your host (assuming you’re having the same python version). Here, you’ll have to make sure that the python’s paths on the host system match the same paths on the containerized system. In my case, I had to create a symlink (on the host): /usr/local/bin/python3.10
  • If using CUDA locally, map your current CUDA version volumes to save resources.

FastAPI and LangChain Integration

FastAPI serves as our microservice foundation, complemented by LangChain for LLM flexibility beyond OpenAI models (you can later pair it with any other LLM, even open source ones). This combination allows us to craft a powerful, scalable solution. 🛠️ 

  1. from fastapi import FastAPI, HTTPException
  2. from pydantic import BaseModel
  3. import requests
  4. from transformers import BlipProcessor, BlipForConditionalGeneration
  5. from PIL import Image
  6. import torch
  7. import time
  8. app = FastAPI()
  9.  
  10.  
  11. class ImageRequest(BaseModel):
  12. context: str
  13. image_url: str
  14. id: int
  15.  
  16.  
  17. # Initialize Hugging Face model and processor with GPU support
  18. processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
  19.  
  20. # Check if CUDA is available and move the model to GPU
  21. if torch.cuda.is_available():
  22. model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large").to("cuda")
  23. else:
  24. model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")
  25. print("CUDA not available, using CPU instead.")
  26.  
  27.  
  28. @app.post("/process-image")
  29. async def process_image(request: ImageRequest):
  30. try:
  31. # Download image from URL
  32. try:
  33. raw_image = Image.open(requests.get(request.image_url, stream=True).raw).convert('RGB')
  34. except Exception as e:
  35. raise HTTPException(status_code=500, detail=str(e))
  36.  
  37. # Process and generate caption
  38. if torch.cuda.is_available():
  39. inputs = processor(raw_image, text="a photo of", return_tensors="pt").to("cuda")
  40. else:
  41. inputs = processor(raw_image, text="a photo of", return_tensors="pt")
  42. print("CUDA not available, using CPU instead.")
  43.  
  44. outputs = model.generate(**inputs, max_new_tokens=100)
  45. caption = processor.decode(outputs[0], skip_special_tokens=True)
  46.  
  47. print("AI caption: " + caption)
  48.  
  49. # @TODO: Add LangChain here.
  50.  
  51. return {"id": request.id, "caption": caption}
  52. except Exception as e:
  53. print("Exception during optimization: " + str(e))
  54. raise HTTPException(status_code=500, detail=str(e))

Essential Packages (pyproject.toml - using Poetry)

  1. [tool.poetry]
  2. name = "boilerplate"
  3. version = "0.1.0"
  4. description = "boilerplate-app"
  5. authors = ["Nikro"]
  6.  
  7. [tool.poetry.dependencies]
  8. python = "^3.10"
  9. fastapi = "^0.104.1"
  10. uvicorn = "^0.24.0"
  11. transformers = "^4.35.2"
  12. torch = "^2.1.1"
  13. requests = "^2.31.0"
  14. Pillow = "^10.1.0"
  15. langchain = "^0.0.339"
  16. openai = "^1.3.4"
  17.  
  18. [tool.poetry.dev-dependencies]
  19.  
  20. [build-system]
  21. requires = ["poetry-core>=1.0.0"]
  22. build-backend = "poetry.core.masonry.api"

Docker Compose and Dockerfile:

Docker-compose.yml:

  1. version: "3"
  2.  
  3. services:
  4. # Node container (for Husky, linting, etc).
  5.   node:
  6.   image: node:latest
  7.   restart: unless-stopped
  8.   working_dir: /usr/src/app
  9.   command: ["tail", "-f", "/dev/null"]
  10.   volumes:
  11. - /usr/bin/git:/usr/bin/git:ro
  12. - ~/.gitconfig:/home/node/.gitconfig:ro
  13. - .:/usr/src/app
  14.  
  15. # App we want to build using our /app/Dockerfile.
  16.   app:
  17.   build:
  18.   context: ./app
  19.   dockerfile: Dockerfile
  20.   restart: unless-stopped
  21.   working_dir: /usr/src/app
  22.   ports:
  23. - "8000:8000"
  24.   env_file:
  25. - .env # OpenAI key lives here.
  26.   networks:
  27. - external
  28.   volumes:
  29. - ./app:/usr/src/app
  30. - /usr/local/cuda:/usr/local/cuda # Map CUDA to the container.
  31.   runtime: nvidia
  32.   environment:
  33. - NVIDIA_VISIBLE_DEVICES=all # You need this to use the GPU.
  34.  
  35. # There are 2 networks: external and internal.
  36. networks:
  37.   external:
  38.   driver: bridge
  39.   internal:
  40.   driver: bridge

Dockerfile:

  1. # Use the official lightweight Python image.
  2. FROM python:3.10-slim
  3.  
  4. # Set environment variables
  5. ENV PYTHONDONTWRITEBYTECODE 1
  6. ENV PYTHONUNBUFFERED 1
  7. ENV POETRY_VERSION 1.1.13
  8. ENV PYTHONPATH /usr/src/app
  9.  
  10. # Install system dependencies
  11. RUN apt-get update && \
  12. apt-get install -y --no-install-recommends build-essential gcc g++ curl && \
  13. rm -rf /var/lib/apt/lists/*
  14.  
  15. # Install Poetry
  16. RUN curl -sSL https://install.python-poetry.org | python3 -
  17.  
  18. # Add Poetry to PATH in .bashrc for interactive sessions
  19. RUN echo "export PATH=\"/root/.local/bin:$PATH\"" >> /root/.bashrc
  20.  
  21. # Set the PATH for subsequent Docker layers
  22. ENV PATH "/root/.local/bin:${PATH}"
  23.  
  24. RUN poetry --version
  25.  
  26. # Set the working directory inside the container
  27. WORKDIR /usr/src/app
  28.  
  29. # Copy the pyproject.toml file (and optionally poetry.lock) into the container
  30. COPY pyproject.toml poetry.lock* ./
  31.  
  32. # Install project dependencies
  33. RUN poetry config virtualenvs.in-project true \
  34. && poetry install --no-interaction --no-ansi
  35.  
  36. # Add the virtual environment bin directory to PATH
  37. ENV PATH="/usr/src/app/.venv/bin:$PATH"
  38.  
  39. # Copy the rest of your app's source code from your host to your image filesystem.
  40. COPY . .
  41.  
  42. # Run the command to start your application
  43. EXPOSE 8000
  44. CMD ["uvicorn", "main:app", "--reload", "--host", "0.0.0.0"]
  45. ENTRYPOINT ["/usr/src/app/entrypoint.sh"]

Run `docker compose up` and test the setup at: http://localhost:8000/docs

Crafting the Caption: Generating Alt/Title Tags

With the image recognition part up and running, let's explore LangChain to refine our captions. Using ChatGPT3.5, we can perfect our SEO-friendly captions with this prompt template: 

✅ Prompt Template:

"""
You are an advanced AI specializing in creating SEO-optimized image captions for use as alt text or title text. Your captions must follow SEO best practices, focusing on concise, descriptive language that includes relevant keywords. You will be given a context describing where the image was used or uploaded and an initial caption generated by an image captioning tool. It is your job to refine this caption to make it more relevant and SEO-friendly, correcting any contextual mistakes made by the initial tool.

Context: {context}

Initial Image Caption: {caption}

Refined SEO-optimized Image Caption:
"""

Updated to work seamlessly with LangChain, our snippet now looks like this: 

✅ LangChain Snippet (same function, but this time with LangChain):

  1. @app.post("/process-image")
  2. async def process_image(request: ImageRequest):
  3. try:
  4. # Download image from URL
  5. try:
  6. raw_image = Image.open(requests.get(request.image_url, stream=True).raw).convert('RGB')
  7. except Exception as e:
  8. raise HTTPException(status_code=500, detail=str(e))
  9.  
  10. # Process and generate caption
  11. if torch.cuda.is_available():
  12. inputs = processor(raw_image, text="a photo of", return_tensors="pt").to("cuda")
  13. else:
  14. inputs = processor(raw_image, text="a photo of", return_tensors="pt")
  15. print("CUDA not available, using CPU instead.")
  16.  
  17. outputs = model.generate(**inputs, max_new_tokens=100)
  18. caption = processor.decode(outputs[0], skip_special_tokens=True)
  19.  
  20. print("AI caption: " + caption)
  21.  
  22. # Now let's process it with langchain.
  23. llm = ChatOpenAI(temperature=0.2, model="gpt-3.5-turbo-1106", request_timeout=20)
  24. prompt = PromptTemplate.from_template(
  25. "You are an advanced AI specializing in creating SEO-optimized image captions for use as alt text or title "
  26. "text. Your captions must follow SEO best practices, focusing on concise, descriptive language that "
  27. "includes relevant keywords. You will be given a context describing where the image was used or uploaded "
  28. "and an initial caption generated by an image captioning tool. It is your job to refine this caption to "
  29. "make it more relevant and SEO-friendly, correcting any contextual mistakes made by the initial tool. "
  30. "\n\n"
  31. "Context: {context}\n\n"
  32. "Initial Image Caption: {caption}\n\n"
  33. "Refined SEO-optimized Image Caption:"
  34. )
  35.  
  36. try:
  37. print("Optimizing caption...")
  38. handler = StdOutCallbackHandler()
  39. llm_chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler])
  40. caption_optimized = llm_chain.run({"context": request.context, "caption": caption})
  41. print("Result: " + caption_optimized)
  42.  
  43. # Set delay of 3 seconds between each prompt - because OpenAI sometimes does NOT reply.
  44. time.sleep(3)
  45. except Exception as e:
  46. print("Exception during optimization: " + str(e))
  47. raise HTTPException(status_code=500, detail=str(e))
  48.  
  49. return {"id": request.id, "caption": caption_optimized}

Test it out on our setup page and see the magic happen! ✨ 

New reply is:

  1. {
  2. "id": 0,
  3. "caption": "festive Christmas tree in a cozy room with a view of the outdoors"
  4. }

Implementing the API for Real-world Use

Our API is live and ready at https://localhost:8000. Let’s put it to work with a Python script that updates captions in our CSV, one by one. I suggest starting with a small batch of 10-20 images to ensure everything is running smoothly. 

Here’s an example of the script.

Example Python Script (with LangChain):  

  1. import csv
  2. import requests
  3.  
  4. def generate_context(row):
  5. # This is the category-based context.
  6. if row['IMG_TYPE'] == 'field_contributed_images':
  7. return (f"This image was uploaded by a user, as a comment in an article called {row['Node Title']}. As this is"
  8. f"a contributed image, probably it's a complaint or a meme. Be aware of it.")
  9. else:
  10. return "No context - try and figure this one out on your own."
  11.  
  12.  
  13. def process_and_update_row(row):
  14. context = generate_context(row)
  15. payload = {
  16. "context": context,
  17. "image_url": row['File URL'],
  18. "id": row['File ID']
  19. }
  20.  
  21. try:
  22. response = requests.post('http://localhost:8000/process-image', json=payload)
  23. if response.status_code == 200:
  24. row['CAPTION_NEW'] = response.json().get('caption')
  25. print(f"Processed File ID {row['File ID']}: Caption generated.")
  26. else:
  27. row['CAPTION_NEW'] = "Error generating caption."
  28. print(f"Error for File ID {row['File ID']}: {response.text}")
  29. except Exception as e:
  30. print(f"Exception for File ID {row['File ID']}: {str(e)}")
  31. row['CAPTION_NEW'] = "Exception during processing."
  32.  
  33. return row
  34.  
  35.  
  36. def process_csv(file_path, limit=50):
  37. processed_count = 0
  38. with open(file_path, mode='r', encoding='utf-8') as file:
  39. rows = list(csv.DictReader(file))
  40.  
  41. for row in rows:
  42. if processed_count >= limit or row['CAPTION_NEW']:
  43. continue
  44.  
  45. updated_row = process_and_update_row(row)
  46. rows[rows.index(row)] = updated_row
  47. processed_count += 1
  48.  
  49. # Write the updated row back to the CSV file
  50. with open(file_path, mode='w', newline='', encoding='utf-8') as file:
  51. writer = csv.DictWriter(file, fieldnames=rows[0].keys())
  52. writer.writeheader()
  53. writer.writerows(rows)
  54.  
  55.  
  56. csv_file_path = 'file.csv'
  57. limit = 10 # Adjust the limit as needed - remember, 1st time do around 10 to test the results.
  58. process_csv(csv_file_path, limit=limit)

If adjustments are needed, tweak the main prompt or contexts for each image category until you're satisfied with the outcomes. 🔄

The Final Step: Updating Your System

After perfecting your captions, it's time for the final act – updating your website's images with these new, AI-generated alt/title tags. Run an update script within your system to apply these changes, ensuring your images are now SEO-friendly and more accessible. 🌍

Embracing AI in SEO and Web Development

The world of open-source models offers a plethora of solutions for tasks you'd think were impossible, and in record time too. Hugging Face is an excellent resource to explore these models. Using LangChain combined with LLMs creates a solid, efficient workflow. Plus, containerizing your setup as a microservice using templates not only accelerates development but also keeps things organized and scalable. 🚀

Other things to consider to enhance in the process (not covered in this article):

  • Consider using the same pipeline to rename the images (from 123.png to something like christmas-tree.png, also make sure you don't have duplicates), suggested by Andrian Valeanu;
  • Consider using a length limit for those captions - suggested by Andrian Valeanu as well;
  • You can also rewrite the images (a bit more complex) - so that they incorporate the captions and authoring (maybe?) as metadata of the image - embedded straight into the image - however this requires you to re-save all the images - also suggested by Andrian Valeanu 😉

I'm committed to creating more articles about how developers and entrepreneurs like you can leverage these Open Source models in practical, hands-on scenarios. If you find these articles useful, consider supporting my work on Patreon. Every bit of support is greatly appreciated! 💖

 

Comments:

Feel free to ask any question / or share any suggestion!