Connect with us

SEO

8 Useful Python Libraries for SEO & How To Use Them

Published

on

8 Useful Python Libraries for SEO & How To Use Them


Editor’s note: As 2021 winds down, we’re celebrating with a 12 Days of Christmas Countdown of the most popular, helpful expert articles on Search Engine Journal this year.

This collection was curated by our editorial team based on each article’s performance, utility, quality, and the value created for you, our readers.

Each day until December 24th, we’ll repost one of the best columns of the year, starting at No. 12 and counting down to No. 1. Our countdown starts today with our No. 3 column, which was originally published on March 18, 2021.

Ruth Everett’s article on utilizing Python libraries for automating and accomplishing SEO tasks makes a marketer’s work so much easier. It’s very easy to read and perfect for beginners and even more experienced SEO professionals that want to use Python more.  

Great work on this, Ruth, and we really appreciate your contributions to Search Engine Journal.

Enjoy!   


Python libraries are a fun and accessible way to get started with learning and using Python for SEO.

Advertisement

Continue Reading Below

A Python library is a collection of useful functions and code that allow you to complete a number of tasks without needing to write the code from scratch.

There are over 100,000 libraries available to use in Python, which can be used for functions from data analysis to creating video games.

In this article, you’ll find several different libraries I have used for completing SEO projects and tasks. All of them are beginner-friendly and you’ll find plenty of documentation and resources to help you get started.

Why Are Python Libraries Useful For SEO?

Each Python library contains functions and variables of all types (arrays, dictionaries, objects, etc.) which can be used to perform different tasks.

For SEO, for example, they can be used to automate certain things, predict outcomes, and provide intelligent insights.

It is possible to work with just vanilla Python, but libraries can be used to make tasks much easier and quicker to write and complete.

Python Libraries For SEO Tasks

There are a number of useful Python libraries for SEO tasks including data analysis, web scraping, and visualizing insights.

Advertisement

Continue Reading Below

This is not an exhaustive list, but these are the libraries I find myself using the most for SEO purposes.

Pandas

Pandas is a Python library used for working with table data. It allows for high-level data manipulation where the key data structure is a DataFrame.

DataFrames are similar to Excel spreadsheets, however, they are not limited to row and byte limits and are also much faster and more efficient.

The best way to get started with Pandas is to take a simple CSV of data (a crawl of your website, for example) and save this within Python as a DataFrame.

Once you have this stored in Python, you can perform a number of different analysis tasks including aggregating, pivoting, and cleaning data.

For example, if I have a complete crawl of my website and want to extract only those pages that are indexable, I will use a built-in Pandas function to include only those URLs in my DataFrame.

import pandas as pd 
df = pd.read_csv('/Users/rutheverett/Documents/Folder/file_name.csv')
df.head
indexable = df[(df.indexable == True)]
indexable

Requests

The next library is called Requests and is used to make HTTP requests in Python.

Requests uses different request methods such as GET and POST to make a request, with the results being stored in Python.

One example of this in action is a simple GET request of URL, this will print out the status code of a page:

import requests
response = requests.get('https://www.deepcrawl.com') print(response)

You can then use this result to create a decision-making function, where a 200 status code means the page is available but a 404 means the page is not found.

if response.status_code == 200:
    print('Success!')
elif response.status_code == 404:
    print('Not Found.')

You can also use different requests such as headers, which display useful information about the page like the content type or how long it took to cache the response.

headers = response.headers
print(headers)

response.headers['Content-Type']

There is also the ability to simulate a specific user agent, such as Googlebot, in order to extract the response this specific bot will see when crawling the page.

headers = {'User-Agent': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'} ua_response = requests.get('https://www.deepcrawl.com/', headers=headers) print(ua_response)

Beautiful Soup

Beautiful Soup is a library used to extract data from HTML and XML files.

Advertisement

Continue Reading Below

Fun fact: The BeautifulSoup library was actually named after the poem from Alice’s Adventures in Wonderland by Lewis Carroll.

As a library, BeautifulSoup is used to make sense of web files and is most often used for web scraping, as it can transform an HTML document into different Python objects.

For example, you can take a URL and use Beautiful Soup together with the Requests library to extract the title of the page.

from bs4 import BeautifulSoup 
import requests
url="https://www.deepcrawl.com" 
req = requests.get(url) 
soup = BeautifulSoup(req.text, "html.parser")
title = soup.title print(title)

Beautiful Soup Title

Additionally, using the find_all method, BeautifulSoup enables you to extract certain elements from a page, such as all a href links on the page:

Advertisement

Continue Reading Below

url="https://www.deepcrawl.com/knowledge/technical-seo-library/" 
req = requests.get(url) 
soup = BeautifulSoup(req.text, "html.parser")

for link in soup.find_all('a'): 
    print(link.get('href'))

Beautiful Soup All Links

Putting Them Together

These three libraries can also be used together, with Requests used to make the HTTP request to the page we would like to use BeautifulSoup to extract information from.

We can then transform that raw data into a Pandas DataFrame to perform further analysis.

URL = 'https://www.deepcrawl.com/blog/'
req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")

links = soup.find_all('a')

df = pd.DataFrame({'links':links})
df

Matplotlib And Seaborn

Matplotlib and Seaborn are two Python libraries used for creating visualizations.

Matplotlib allows you to create a number of different data visualizations such as bar charts, line graphs, histograms, and even heatmaps.

Advertisement

Continue Reading Below

For example, if I wanted to take some Google Trends data to display the queries with the most popularity over a period of 30 days, I could create a bar chart in Matplotlib to visualize all of these.

Matplotlib Bar Graph

Seaborn, which is built upon Matplotlib, provides even more visualization patterns such as scatterplots, box plots, and violin plots in addition to line and bar graphs.

It differs slightly from Matplotlib as it uses fewer syntax and has built-in default themes.

Advertisement

Continue Reading Below

One way I’ve used Seaborn is to create line graphs in order to visualize log file hits to certain segments of a website over time.

Matplotlib Line Graph

sns.lineplot(x = "month", y = "log_requests_total", hue="category", data=pivot_status)
plt.show()

This particular example takes data from a pivot table, which I was able to create in Python using the Pandas library, and is another way these libraries work together to create an easy-to-understand picture from the data.

Advertools

Advertools is a library created by Elias Dabbas that can be used to help manage, understand, and make decisions based on the data we have as SEO professionals and digital marketers.

Advertisement

Continue Reading Below

Sitemap Analysis

This library allows you to perform a number of different tasks such as downloading, parsing, and analyzing XML Sitemaps to extract patterns or analyze how often content is added or changed.

Robots.txt Analysis

Another interesting thing you can do with this library is to use a function to extract a website’s robots.txt into a DataFrame, in order to easily understand and analyze the rules set.

You can also run a test within the library in order to check whether a particular user-agent is able to fetch certain URLs or folder paths.

URL Analysis

Advertools also enables you to parse and analyze URLs in order to extract information and better understand analytics, SERP, and crawl data for certain sets of URLs.

You can also split URLs using the library to determine things such as the HTTP scheme being used, the main path, additional parameters, and query strings.

Selenium

Selenium is a Python library that is generally used for automation purposes. The most common use case is testing web applications.

Advertisement

Continue Reading Below

One popular example of Selenium automating a flow is a script that opens a browser and performs a number of different steps in a defined sequence such as filling in forms or clicking certain buttons.

Selenium employs the same principle as is used in the Requests library that we covered earlier.

However, it will not only send the request and wait for the response but also render the webpage that is being requested.

To get started with Selenium, you will need a WebDriver in order to make the interactions with the browser.

Each browser has its own WebDriver; Chrome has ChromeDriver and Firefox has GeckoDriver, for example.

These are easy to download and set up with your Python code. Here is a useful article explaining the setup process, with an example project.

Scrapy

The final library I wanted to cover in this article is Scrapy.

While we can use the Requests module to crawl and extract internal data from a webpage, in order to pass that data and extract useful insights we also need to combine it with BeautifulSoup.

Advertisement

Continue Reading Below

Scrapy essentially allows you to do both of these in one library.

Scrapy is also considerably faster and more powerful, completes requests to crawl, extracts and parses data in a set sequence, and allows you to shield the data.

Within Scrapy, you can define a number of instructions such as the name of the domain you would like to crawl, the start URL, and certain page folders the spider is allowed or not allowed to crawl.

Scrapy can be used to extract all of the links on a certain page and store them in an output file, for example.

class SuperSpider(CrawlSpider):
   name="extractor"
   allowed_domains = ['www.deepcrawl.com']
   start_urls = ['https://www.deepcrawl.com/knowledge/technical-seo-library/']
   base_url="https://www.deepcrawl.com"
   def parse(self, response):
       for link in response.xpath('//div/p/a'):
           yield {
               "link": self.base_url + link.xpath('.//@href').get()
           }

You can take this one step further and follow the links found on a webpage to extract information from all the pages which are being linked to from the start URL, kind of like a small-scale replication of Google finding and following links on a page.

from scrapy.spiders import CrawlSpider, Rule
 
 
class SuperSpider(CrawlSpider):
    name="follower"
    allowed_domains = ['en.wikipedia.org']
    start_urls = ['https://en.wikipedia.org/wiki/Web_scraping']
    base_url="https://en.wikipedia.org"
 
    custom_settings = {
        'DEPTH_LIMIT': 1
    }
 
    def parse(self, response):
        for next_page in response.xpath('.//div/p/a'):
            yield response.follow(next_page, self.parse)
 
        for quote in response.xpath('.//h1/text()'):
            yield {'quote': quote.extract() }

Learn more about these projects, among other example projects, here.

Final Thoughts

As Hamlet Batista always said, “the best way to learn is by doing.”

Advertisement

Continue Reading Below

I hope that discovering some of the libraries available has inspired you to get started with learning Python, or to deepen your knowledge.

Python Contributions From The SEO Industry

Hamlet also loved sharing resources and projects from those in the Python SEO community. To honor his passion for encouraging others, I wanted to share some of the amazing things I have seen from the community.

As a wonderful tribute to Hamlet and the SEO Python community he helped to cultivate, Charly Wargnier has created SEO Pythonistas to collect contributions of the amazing Python projects those in the SEO community have created.

Hamlet’s priceless contributions to the SEO Community are featured.

Moshe Ma-yafit created a super cool script for log file analysis, and in this post explains how the script works. The visualizations it is able to display including Google Bot Hits By Device, Daily Hits by Response Code, Response Code % Total, and more.

Koray Tuğberk GÜBÜR is currently working on a Sitemap Health Checker. He also hosted a RankSense webinar with Elias Dabbas where he shared a script that records SERPs and Analyses Algorithms.

Advertisement

Continue Reading Below

It essentially records SERPs with regular time differences, and you can crawl all the landing pages, blend data and create some correlations.

John McAlpin wrote an article detailing how you can use Python and Data Studio to spy on your competitors.

JC Chouinard wrote a complete guide to using the Reddit API. With this, you can perform things such as extracting data from Reddit and posting to a Subreddit.

Rob May is working on a new GSC analysis tool and building a few new domain/real sites in Wix to measure against its higher-end WordPress competitor while documenting it.

Masaki Okazawa also shared a script that analyzes Google Search Console Data with Python.

2021 SEJ Christmas Countdown:

Advertisement

Continue Reading Below

Featured image: jakkaje879/Shutterstock





Source link

Continue Reading
Comments

SEO

Small Business Search Trends On The Rise In 2022

Published

on

Small Business Search Trends On The Rise In 2022


A new report from Semrush reveals searches related to small businesses, particularly ‘opening’ a small business, are on the rise.

The report details the business categories and specific search queries gaining traction and offers insight into what areas of marketing businesses are investing in.

Data in the report is based on the keyword and search volume intel collected by Semrush.

After analyzing the traffic growth trends to organic search performance over time, Semrush shares which small business categories manage to do better online.

Here are some key highlights from the report.

Search Trends Around Opening A Business

Looking at search volume for various searches that indicate an intent to open a business, the report finds:

  • Over the past four years, the number of “open business” searches has grown by 21%.
  • The majority of “open business” searches occur in January and March.
  • From 2018 to 2022, searches for all things related to starting a small business spiked by 76%.

Most Popular Small Business Categories

While general interest in starting a small business is spiking, search volume indicates aspiring business owners are looking to open boutiques:

  • Almost one-fifth of all entrepreneurs-to-be want to open a boutique.
  • starting an Etsy business looks attractive to almost one-fifth of all the searchers.
  • Vending machines appear to be gaining the most significant traction, as the category broke into the second spot of most-searched small businesses.

Analyzing search trends across regions, the report finds:

  • The “Etsy, cleaning, boutique” triad is present—fully or partially—across each state’s top 3.
  • In exactly half the states, coffee shops also make it into the top 3 most searched small business categories.
  • Montana and Vermont searchers also consider delivery services as a potential undertaking.

Most Frequent ‘Small Business’ Related Searches

Half of all the top small business-related searches are related to financing.

Here are the top queries, ordered by average monthly searches:

  1. Small business loans
  2. Small business grants
  3. Small business administration
  4. Small business ideas
  5. How to start a small business

Small Business Searches Related To Marketing

Keyword stats indicate small business owners try to embrace all the up-and-coming trends:

  • Searches for digital marketing services surged by 1,500% (especially fast during the pandemic).
  • Interest in creating short videos for small businesses grew by 420%.
  • 600% more people were looking up free text message marketing in 2022 than in 2018.

Small Business Site Categories With the Highest Traffic Growth

The report finds the average traffic growth for small businesses across the board was 2900%.

Semrush states:

“This means that over the past 4 years, most of the websites within our client list managed to expand their visitor base.”

Top 10 Small Business Site Categories By Share of High-Ranking Organic Keywords

The report explores which small business site categories have the largest share of high-ranking organic keywords.

Here are the categories listed in order, followed by the median number of organic keywords where the domain ranks in the top 10

  1. Publishing: 45,581
  2. Online Media: 10,116
  3. Veterinary: 9,379
  4. Entertainment: 6,627
  5. Consumer Services: 3,518
  6. Consumer Goods: 3,339
  7. Building Materials: 2,957
  8. Music: 2,593
  9. Human Resources: 2,145
  10. Food & Beverages: 1,839

What’s interesting to note here is how the top 10 categories by high-ranking keywords don’t line up with the fastest-growing site categories by traffic. Semrush suggests this could mean ranking for a high number of keywords might not directly lead to traffic growth.


Featured Image: Monkey Business Images/Shutterstock





Source link

Continue Reading

SEO

Google Top Stories Topics Sections On Desktop

Published

on

Google Top Stories Topics Sections On Desktop


Google Search is now grouping some of its Top Stories sections for some queries by topics. This was working on mobile for a while but is now reportedly working for the desktop Top Stories Google Search results.

Here is a screenshot for a query on [biden] where Google Search on desktop is showing a topic section for abortion articles and a topic section for the US military in Europe. You can click on the image to enlarge or check it out yourself for that query.

click for full size

Yes, the layout is now new, it is from December 2021 but the topics on desktop search is new according to Shalom Goodman, who is a news SEO and would know this.

He shared more screenshots on Twitter:

It makes sense to group some queries by topics for the Top Stories section.

Also, I wasn’t sure if this was new, but the “Topics In News” section on the right is supposedly new:

Forum discussion at Twitter.





Source link

Continue Reading

SEO

64.2% Of Sites Use WordPress

Published

on

64.2% Of Sites Use WordPress


WordPress continues to dominate the content management systems (CMS) market and is currently used by 64.2% of websites that have a CMS, according to data from W3Techs.com.

Shopify is a distant second for June 2022 and accounts for 6.3% of the CMS market.

Wix, Squarespace, and Joomla round out the top five with less than 3.5% market share each.

CMS Market Share June 2022 is available to reprint with attribution; see Creative Commons license for details.

W3Techs notes that 33.1% of websites do not use any of the content management systems they monitor.

WordPress is therefore used by 43% of all websites, and 64.2% of those with an identifiable CMS.

WordPress Plans To Continue Working On Security, Stability

WordPress shows no signs of slowing down and is currently about five years into a ten-year project that involves rewriting its entire codebase.

In a recent interview, Josepha Haden Chomphosy, Executive Director of WordPress, told SEJ,

“…the next year, as with all of the years in a project like that, is making sure we are still as stable and capable as a CMS as people have come to expect while also still pushing forward with a newer more modern way to manage your content online.”

WordPress rolled out version change Arturo 6.0 this month and within two weeks, 36.2% of WP sites had updated to it.

Roger Montti reported that WordPress shared a proposal for a plugin checker that would improve security and site performance by proactively vetting plugins, as well.

Shopify Enters B2B Marketplace With June Update

Shopify released its Summer ’22 Edition in June, adding more than 100 new features for users.

A new feature simply and aptly called “B2B” will connect Shopify Plus merchants with wholesalers and offer integrations with NetSuite, Brightpearl, Acumatica, and others for a more seamless experience.

See Brian Frederick’s coverage here to learn more.

Wix Publishes Structured Data Guide For SEO Pros

Wix, in third place for CMS market share this month, released “Wix Structured Data Guide: How To Use Standard & Custom Markup” in June.

Contributing author Mordy Oberstein, Head of SEO Branding at Wix, shared his comprehensive guide to implementing structured data on Wix here at Search Engine Journal.

“In fewer than three years, Wix went from supporting little by way of structured data to offering SEO pros and site owners the ability to do nearly whatever they want with relative ease,” Oberstein wrote.

He also noted that due to recent platform updates, any content elsewhere on the internet around this topic is now out of date.

See his guide above to learn more about applying structured data to your Wix site.

Stay tuned for next month’s CMS Market Share Monthly report.

Related reading:


Featured image: Paulo Bobita/Search Engine Journal
Data source: W3Techs.com, Usage statistics of content management systems, as of June 27, 2002.





Source link

Continue Reading

Trending

Copyright © 2021 Liveseo.com