Python爬虫urllib模块：get方式

本程序以爬取百度首页为例

创新互联建站是一家专业提供榕江企业网站建设,专注与网站制作、成都做网站、H5场景定制、小程序制作等业务。10年已为榕江众多企业、政府机构等服务。创新互联专业网站制作公司优惠进行中。

格式：

导入urllib.request

打开爬取的网页: response = urllib.request.urlopen('网址')

读取网页代码: html = response.read()

打印:

1.不decode

print(html) #爬取的网页代码会不分行，没有空格显示，很难看

2.decode

print(html.decode()) #爬取的网页代码会分行，像写规范的代码一样，看起来很舒服

查询请求结果：

a. response.status # 返回 200：请求成功 404：网页找不到，请求失败

b. response.getcode() # 返回 200：请求成功 404：网页找不到，请求失败

1.不decode的程序如下：

import urllib.request

response = urllib.request.urlopen('www.baidu.com')
html = response.read()
print(html)
print("------------------------------------------------------------------")
print("------------------------------------------------------------------")
print(response.status)

运行结果：

Python 爬虫 urllib模块：get方式

2.decode的程序如下：

import urllib.request

response = urllib.request.urlopen('www.baidu.com')
html = response.read()

print(html.decode())
print("------------------------------------------------------------------")
print("------------------------------------------------------------------")
print(response.status)

运行结果：







    
    
    

    
    
    
    










    
    百度一下，你就知道
    

html,body{height:100%}
.
.
.
.










------------------------------------------------------------------
------------------------------------------------------------------
------------------------------------------------------------------
200

当前名称：Python爬虫urllib模块：get方式
网站路径：http://cdxtjz.cn/article/josjhj.html

Python爬虫urllib模块：get方式

其他资讯