Seo robots.txt阻止爬虫进入访问页面_Seo_Web Crawler_Robots.txt

Seo robots.txt阻止爬虫进入访问页面

seo web-crawler

Seo robots.txt阻止爬虫进入访问页面,seo,web-crawler,robots.txt,Seo,Web Crawler,Robots.txt,我试图找到如何阻止爬虫访问我的链接，如下所示： site.com/something-search.html 我想阻止所有/某些东西-* 有人能帮我吗？在你的robots.txt中 User-agent: * Disallow: site.com/something-(1st link) . . . Disallow: site.com/somedthing-(last link) 为您不想看到的每个页面添加条目虽然robots.txt中不允许使用正则表达式，但一些智能爬虫可以理解它看一看

我试图找到如何阻止爬虫访问我的链接，如下所示：

site.com/something-search.html

我想阻止所有/某些东西-*

有人能帮我吗？

在你的robots.txt中

User-agent: *
Disallow: site.com/something-(1st link)
.
.
.
Disallow: site.com/somedthing-(last link)

为您不想看到的每个页面添加条目

虽然robots.txt中不允许使用正则表达式，但一些智能爬虫可以理解它

看一看

这将阻止路径以

/something-

开头的所有URL，例如可从

访问的robots.txthttp://example.com/robots.txt

：

```
http://example.com/something-
```
```
http://example.com/something-foo
```
```
http://example.com/something-foo.html
```
```
http://example.com/something-foo/bar
```

仍然允许使用以下URL：

```
http://example.com/something
```
```
http://example.com/something.html
```
```
http://example.com/something/
```

问题是我不知道第一个和最后一个链接是什么，这是我的搜索结果页面，我想禁用爬虫程序来访问该搜索页面。。。我尝试过：Disallow:/search-*此处不允许使用通配符（如“*”）。下面的行必须是允许、禁止、注释或空行语句。在robots.txt

Disallow

中不能使用*不能包含主机（

site.com

）这个问题似乎离题了，因为它是关于搜索引擎优化的

User-agent: *
Disallow: /something-