网络爬虫之构造Request对象'User-Agent'
爬虫中可以伪装浏览器,通过设置一个Request对象的’User-Agent’选项.方法一:构造请求时直接构造url = 'http://www.httpbin.org/post'headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0...
·
爬虫中可以伪装浏览器,通过设置一个Request对象的’User-Agent’选项.
方法一:构造请求时直接构造
url = 'http://www.httpbin.org/post'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0',
'Host': 'httpbin.org'
}
dict = {
'hello': 'world'
}
data = bytes(parse.urlencode(dict), encoding='utf8')
req = request.Request(url=url, data=data, headers=headers, method='POST')
response = request.urlopen(req)
print(response.read().decode('utf-8'))
运行结果:
{"args":{},"data":"","files":{},"form":{"hello":"world"},"headers":{"Accept-Encoding":"identity","Connection":"close","Content-Length":"11","Content-Type":"application/x-www-form-urlencoded",
"Host":"httpbin.org","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0"},"json":null,"origin":"118.113.89.18","url":"http://httpbin.org/post"}
方法二:使用add_header()方法
url = 'http://www.httpbin.org/post'
dict = {
'hello': 'world'
}
data = bytes(parse.urlencode(dict), encoding='utf8')
req = request.Request(url=url, data=data, method='POST')
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0')
response = request.urlopen(req)
print(response.read().decode('utf-8'))
运行结果:
{"args":{},"data":"","files":{},"form":{"hello":"world"},"headers":{"Accept-Encoding":"identity","Connection":"close","Content-Length":"11","Content-Type":"application/x-www-form-urlencoded",
"Host":"www.httpbin.org","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0"},"json":null,"origin":"118.113.89.18","url":"http://www.httpbin.org/post"}
更多推荐
所有评论(0)