Leverage Python and Google Cloud to extract significant search engine optimisation insights from server log information


For my first publish on Search Engine Land, I’ll begin by quoting Ian Lurie:

Log file evaluation is a misplaced artwork. However it could save your search engine optimisation butt!

Sensible phrases.

Nevertheless, getting the info we want out of server log information is normally laborious:

  • Gigantic log information require strong information ingestion pipelines, a dependable cloud storage infrastructure, and a stable querying system
  • Meticulous information modeling can be wanted in an effort to convert cryptic, uncooked logs information into legible bits, appropriate for exploratory information evaluation and visualization

Within the first publish of this two-part sequence, I’ll present you how you can simply scale your analyses to bigger datasets, and extract significant search engine optimisation insights out of your server logs.

All of that with only a pinch of Python and a touch of Google Cloud!

Right here’s our detailed plan of motion:

#1 – I’ll begin by providing you with a little bit of context:

  • What are log information and why they matter for search engine optimisation
  • Tips on how to pay money for them
  • Why Python alone doesn’t at all times reduce it on the subject of server log evaluation

#2 – We’ll then set issues up:

  • Create a Google Cloud Platform account
  • Create a Google Cloud Storage bucket to retailer our log information
  • Use the Command-Line to transform our information to a compliant format for querying
  • Switch our information to Google Cloud Storage, manually and programmatically

#three – Lastly, we’ll get into the nitty-gritty of Pythoning – we are going to:

  • Question our log information with Bigquery, inside Colab!
  • Construct a knowledge mannequin that makes our uncooked logs extra legible 
  • Create categorical columns that may improve our analyses additional down the road
  • Filter and export our outcomes to .csv

Partly two of this sequence (out there later this 12 months), we’ll focus on extra superior information modeling strategies in Python to evaluate:

  • Bot crawl quantity
  • Crawl finances waste
  • Duplicate URL crawling

I’ll additionally present you how you can mixture and be a part of log information to Search Console information, and create interactive visualizations with Plotly Sprint!

Excited? Let’s get cracking!



Learn Extra Right here