Obviously the website is doing well, and there are many good external links. Why can't the website ranking be mentioned? If there is a lot of content that cannot be crawled normally, then there may be some problems with the website, and the spider cannot crawl the website content normally. So, what causes spiders to crawl abnormally?
1. DNS exception: When the spider cannot resolve the IP of your website, a DNS exception will occur. It may be that the IP address of your website is wrong, or the domain name service provider bans spiders. Please use WHOIS or host to check whether the IP address of your website is correct and resolvable. If it is incorrect or unresolvable, please contact the domain name registrar to update your IP address.
2. IP ban: IP ban is: restricting the network's egress IP address, prohibiting users of the IP segment from accessing content, in this case, it specifically bans Baidu Spider IP. This setting is only needed when your website does not want spiders to visit. If you want Baidu spider to visit your website, please check whether the spider IP is added by mistake in the relevant settings. It may also be that the space service provider where your website is located has banned Baidu IP. In this case, you need to contact the service provider to change the settings.
3. Abnormal server connection: There are two cases of abnormal server connection: one is that the site is unstable, and the spider is temporarily unable to connect when trying to connect to the server of your website; the other is that the spider has been unable to connect to the server of your website. The reason for the abnormal server connection is usually that your website server is too large and overloaded. It is also possible that your website is not running properly. Please check whether the web server of the website (such as apache, iis) is installed and running normally, and use a browser to check whether the main page can be accessed normally. Your website and host may also prevent spiders from accessing. You need to check the firewall of the website and host.
4. Abnormal network operators: There are two types of network operators: Telecom and Unicom. Spiders cannot access your website through Telecom or Netcom. If this happens, you need to contact the network service operator, or buy space with dual-line services or buy CDN services.
5. UA ban: UA is the user agent (User-Agent), the server identifies the identity of the visitor through the UA. When a website returns an abnormal page (such as 403, 500) or jumps to another page for a specified UA visit, it is a UA ban. This setting is only needed when your website does not want spiders to visit. If you want spiders to visit your website, are there any spider UAs in useragent related settings and modify them in time.
6. Abnormal jump: redirecting the network request to another location is a jump.
7. Dead links: pages that are invalid and cannot provide any valuable information to users are dead links, including protocol dead links and content dead links. For the dead chain, we recommend that the site use the protocol dead chain and submit it to Baidu through the Baidu webmaster platform-dead chain tool, so that Baidu can find the dead chain faster and reduce the negative impact of the dead chain on users and search engines.
8. Other abnormalities:
(1) Abnormality for Baidu referral: The webpage returns behavior different from the normal content for the referral from Baidu.
(2) Abnormality for Baidu ua: The behavior of the webpage returning to Baidu UA is different from the original content of the page
(3) Abnormal JS jump: The webpage loaded a JS jump code that Baidu could not recognize, which caused the user to jump to the page after entering the search result.
(4) Accidental ban caused by excessive pressure: Baidu will automatically set a reasonable grabbing pressure based on the site's size and traffic. However, under abnormal conditions, such as when the pressure control is abnormal, the server will perform a protective accidental ban based on its own load. In this case, please return 503 (which means \"Service Unavailable\") in the return code, so that the spider will try to crawl this link after a while, if the site is free, it will be successfully crawled