Home Blogs Resolve The Shopify robots-txt Issue
Google Analytics

Ever wondered why a perfectly good page on your Shopify store isn't showing up in search results? It could be because of your robots.txt file. This file informs search engines which pages on your site they should index and crawl, and which ones they should skip.

Have you ever received "Indexed, though blocked by robots.txt" notification from Google Search Console? Unsure about its significance or what steps to take next? No need to fret! In this blog, I will cover everything in detail you need to know about this issue and provide you with a step-by-step guide for resolving it.

Understanding Shopify Robots.txt

Shopify robots.txt is a file that works as a guide for search engine robots, commonly referred to as "crawlers" or "spiders," regarding which pages or sections of a website they should refrain from crawling or indexing.

In the event that your robots.txt file contains the line "Disallow: /checkout," it means that the robot will avoid crawling or indexing any pages located within the checkout section of the website.

Shopify is the platform that automates the creation of a robots.txt file. Store owners can further customize in the Shopify admin to not to crawl or index specific pages. This customization is crucial for optimizing your overall store's performance.

How to Access Robots.Txt File in Shopify?

To access the robots.txt file for your Shopify store, you just need to add "/robots.txt" to the end of your store's URL. Alternatively, you can access it through your Shopify admin. Just log in, then navigate to Online Store > Preferences > Search engine listing preview > Edit robots.txt

Know More about Shopify Default Robots.Txt

The default robots.txt file provided by Shopify automatically blocks Google from indexing irrelevant pages or those that might create duplicate content. This naturally benefits your shopify store's SEO.

The default configuration of Shopify's robots.txt file contains the following instructions:

  • Disallow: /checkout
  • Disallow: /cart
  • Disallow: /orders
  • Disallow: /account
  • Disallow: /search
  • Disallow: /apps
  • Disallow: /services

Furthermore, the robots.txt file may contain the line "Allow: /" and "Sitemap: https://mystore.com/sitemap.xml https://mystore.com/sitemap.xml", directing the robots to the store's sitemap, providing a comprehensive list of all website URLs for indexing purposes.

Limiting the indexing of lower-value or redundant pages enables search engines to guide users to more relevant content, resulting in increased targeted traffic.

What Is Shopify Indexed Though Blocked by Robots.txt” Issue?

In this issue, certain pages intended to be excluded from indexing may still be indexed by Google, despite being blocked in your store's robots.txt file.

The following are the reasons causing this issue:

  • The instructions in the robots.txt file are found incorrect and improperly structured.
  • This happens due to the removal or deletion of the robots.txt file.
  • Incorrect server configuration prevents the search engine crawlers from accessing the robots.txt file. A misconfigured firewall, improper file permissions, or other server-side issues could be the reasons behind this.
  • The negative impacts of certain third-party apps on the robots.txt file can lead to pages being indexed even when they are meant to be blocked.
  • Despite being blocked by the robots.txt file, a page might still appear in the website's sitemap, which is the reason why that page is indexed by search engines.
  • Some blocked pages are still crawled and indexed by search engine robots via links on other pages or dynamic URLs.

A Comprehensive Guide to Solve “Indexed, Though Blocked by Robots.txt” Error?

To resolve the "Indexed, though blocked by robots.txt" problem, it's essential to verify that the file effectively blocks the pages you wish to prevent from being indexed. You can either inspect the syntax of your robots.txt file or use the Google Search Console to identify which pages are currently being blocked.

Here are steps to resolve this issue with the help of Google Search Console

  • First of all, head to the Coverage report and choose the Valid button. Afterward, in order to export a list of all blocked URLs, you need to select the Download and All URLs options.
  • Examine the list of URLs and determine which pages you wish to have indexed by search engines.
  • Identify the rules within the robots.txt file that are blocking the pages, and proceed to either eliminate or comment out those lines.
  • Use Google’s robots.txt Tester to test the changes you have made. With this testing, you can confirm that the desired pages you want to index won’t be blocked further.
  • Click on the "VALIDATE FIX" button within the Google Search Console asking Google to re-assess your robots.txt file against your URLs.
  • Keep an eye on the coverage report for any additional warnings or errors.

Remember, just one error in editing your robots.txt file can have a notable effect on how your web pages are indexed and your SEO rankings. Don’t do it yourself. We advise you to take an expert’s help for editing your robots.txt file as it requires technical knowledge

Homepage: The homepage is your website's most crucial page and should always be accessible for crawling and indexing by search engines.
Collection/Category Pages: Indexing these pages is very important as they not only categorize similar products but also aid customers in navigating your website.
Blog Pages: Blog pages should be indexed as they play a vital role in driving organic traffic to your website and enhancing your SEO.
Product Pages: The information contained in these pages is vital for your products, ensuring they are indexed and allowing customers to discover them through organic search, leading to potential purchases.
About Us Page: This page tells the story of your company’s journey that helps strengthen your brand and maintain a strong bond with customers.
Contact Us Page: The visibility of this page is crucial as it guides both search engines and customers on how to contact your business.
Sitemap Page: Offering a map of your website, this page facilitates easier crawling and indexing by search engines.
Privacy Policy and Terms of Service Pages: These pages, detailing your business's approach to handling customer data, must remain accessible to both search engines and customers.

Essentially, any page holding valuable information, that you desire both search engines and customers to access, should not be blocked. If you notice any pages being blocked by robots.txt, it's advisable to eliminate them from the robots.txt file and provide Google with a fresh sitemap for reindexing them.

Pages such as login pages, shopping carts, and checkout pages, and pages that contain duplicate and low-quality content should be blocked by robots.txt.

Conclusion:

The "Shopify Indexed Though Blocked by Robots.txt" issue can be daunting, but thankfully, it's solvable! By understanding the reasons behind it and following the steps outlined in this guide, you can ensure your desired pages are properly indexed and discoverable in search results.

Author :
SpeedBoostr :
Google Speed ‑ SEO
Publish on : 18-06-2024
Share this post :