Welcome to Rongchuang Media. I believe you will reap the most advanced digital operation solutions in the foreign trade industry!
National Service Hotline:400-0430- 678

Rongchuang Cloud Marketing Platform

18Years focus on the enterpriseGlobal digital marketing platformBuilding operations, search engine marketing, social media marketing, brand marketing overseasAll staff, whole network, panoramaNew business model!
You are here: Home » Documentation Help Center » Graphic help » SEO » other » How to set up Robots file?

How to set up Robots file?

Views:0     Author:Site Editor     Publish Time: 2016-06-28      Origin:Site Inquire

What are Robots files?

Robots are an important channel for communication between a site and a search engine spider. The site declares through the robots file that the site does not want to be included in the search engine or specifies that the search engine only includes specific parts. Please note that you only need to use the robots.txt file if your site contains content that you do not want to be included in search engines. If you want search engines to include all the content on the site, please do not create robots.txt files. At present, the Robots file set in the leading system allows all content to be included by search engines.

The format of the robots.txt file
Robots files are often placed in the root directory and contain one or more records. These records are separated by blank lines (with CR, CR/NL, or NL as the terminator). The format of each record is as follows:
In this file, you can use # to annotate, the specific use method is the same as the convention in UNIX. The records in this file usually start with one or more lines of User-agent, followed by several Disallow and Allow lines. The details are as follows:
User-agent:The value of this item is used to describe the name of the search engine robot. In the \"robots.txt\" file, if there are multiple User-agent records indicating that multiple robots will be restricted by \"robots.txt\", for this file, there must be at least one User-agent record . If the value of this item is set to *, it is valid for any robot. In the \"robots.txt\" file, there can only be one record like \"User-agent:*\". If you add \"User-agent:SomeBot\" and several Disallow and Allow lines in the \"robots.txt\" file, then the name \"SomeBot\" will only be received after \"User-agent:SomeBot\" Disallow and Allow line restrictions.
Disallow:The value of this item is used to describe a group of URLs that you do not want to be accessed. This value can be a complete path or a non-empty prefix of the path. URLs beginning with the value of the Disallow item will not be accessed by robots. For example, \"Disallow:/help\" prohibits robots from accessing /help.html, /helpabc.html, /help /index.html, and \"Disallow:/help/\" allows robots to access /help.html, /helpabc .html, cannot access /help/index.html. \"Disallow:\" indicates that robots are allowed to access all urls of the website. There must be at least one Disallow record in the \"/robots.txt\" file. If \"/robots.txt\" does not exist or is an empty file, the website is open to all search engine robots.
Allow:The value of this item is used to describe a set of URLs that you want to access. Similar to the Disallow item, this value can be a complete path or a prefix of the path. URLs that begin with the value of the Allow item are allowed to be accessed by robots. For example \"Allow:/hibaidu\" allows robots to access /hibaidu.htm, /hibaiducom.html, /hibaidu/com.html. All URLs of a website are Allow by default, so Allow is usually used in conjunction with Disallow to realize the function of allowing access to some web pages and prohibiting access to all other URLs.
Use \"*\"and\"$\":Baiduspider supports the use of wildcards \"*\" and \"$\" to match fuzzy URLs.
\"*\" matches 0 or more arbitrary characters
\"$\" matches the end-of-line character.
The last thing to note is: Baidu will strictly abide by the relevant agreements of robots. Please pay attention to distinguish the case of the directories you do not want to be crawled or included. Baidu will treat the files written in robots and the directories you do not want to be crawled and included. Do an exact match, otherwise the robots protocol cannot take effect.

Commonly used Robots file writing

1. Allow all search engines to access
User-agent: *
User-agent: *
Allow: /
Here everyone should pay attention, you can most directly create an empty file \"robots.txt\" and put it in the root directory of the website.
2. Prohibit access to all search engines
User-agent: *
Disallow: /
User-agent: *
3. Prohibit all search engines from accessing several parts of the website, here I use a, b, and c directories instead
User-agent: *
Disallow: /a/
Disallow: /b/
Disallow: /c/
If yes, yes
Allow: /a/
Allow: /b/
Allow: /c/
4. Prohibit access to a search engine, I use w instead
User-agent: w
Disallow: /
User-agent: w
Disallow: /d/*.htm
Adding /d/*.htm after Disallow: means that access to all URLs with the suffix \".htm\" in the /d/ directory is prohibited, including subdirectories.
5. Only allow certain search engines to access, I use e instead
User-agent: e
Nothing is added after Disallow: it means that only e is allowed to access the website.
6. Use \"$\" to restrict access to url
User-agent: *
Allow: .htm$
Disallow: /
This means that only URLs suffixed with \".htm\" can be accessed
7. Prohibit access to all dynamic pages in the website
User-agent: *
Disallow: /*?*
8. Forbid search engine F to crawl all pictures on the website
User-agent: F
Disallow: .jpg$
Disallow: .jpeg$
Disallow: .gif$
Disallow: .png$
Disallow: .bmp$
It means that only the engine is allowed to crawl webpages, and it is forbidden to crawl any pictures (strictly speaking, it is forbidden to crawl pictures in jpg, jpeg, gif, png, bmp format.)
9. Only search engine E is allowed to crawl web pages and .gif format pictures
User-agent: E
Allow: .gif$
Disallow: .jpg$
Disallow: .jpeg$
Disallow: .png$
Disallow: .bmp$
It means that only webpage and gif format pictures are allowed to be crawled, other formats are not allowed
Most search engine robots abide by the rules of robots files, which is roughly how to write robots files. We should remind everyone that the robots.txt file must be written correctly. If you don’t know how to write it, you still need to understand it before writing, so as not to cause trouble for the site’s inclusion.

In the lead system, set the entry of the Robots file:

Step 1: Log in to the lead system and do the following:

Robots file configuration

Step 2: Set the Robots file in the figure below and save it;

Edit page

Step 3: Save and publish to take effect.

Save Publish

If a single page on the website does not need to be included, you can add a meta robots tag to the source code of this page: . This needs to be led by staff to add, if there is such a demand, please contact QQ: 2417402658.

18Years of global marketing experience to help companies break the stalemate in overseas promotion under the epidemic
Leave a message for free to build a foreign trade station + new overseas precision marketing plan

One-click access to search engines, social media operationsProgram


Responsive official website

Search Engine Marketing Solution

Social marketing plan

Brand promotion plan

Marketing Academy

Contact Sunac

Shijiazhuang High-tech Zone International Talent City D2
Hebei Operation Center
 Pre-sale:
    After sale:0311-67691131
 tousu@rongchuangmedia.net
 Floor 19A, Block B, Dongsheng Business Plaza, Shijiazhuang
Beijing Operations Center

   Manager Zhang: 18614049883
 tousu@rongchuangmedia.net
A, Xueyuan Road, Haidian District, Beijing 5 768Creative Industrial ParkB North 1061
Public security recordPublic security record number: 13019902000104
ICP recordRecord certificate number: Hebei ICP No. 15001123
bgValue-added telecommunications business license number: Hebei B2-20115046
© 2020 Shijiazhuang Rongchuang Media Co., Ltd. All rights reserved