09-11-周四_16-52-32
This commit is contained in:
61
project09/03.[任务书]爬虫案例.md
Normal file
61
project09/03.[任务书]爬虫案例.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# 爬虫案例
|
||||
|
||||
## 需求
|
||||
|
||||
- 壁纸获取的网站:https://www.bizhihui.com/dongman/1/?order=time
|
||||
- 网址中的1,表示第一页,如果改成别的数字,就是其他页数
|
||||
- 写一个程序,执行的时候询问需要获取几页的图片,然后自动爬取页面上的图片到images文件夹里面
|
||||
|
||||
## 参考代码
|
||||
|
||||
```python
|
||||
from urllib.request import urlopen, Request, build_opener, install_opener, HTTPSHandler
|
||||
import re, os
|
||||
import ssl
|
||||
|
||||
# 创建一个HTTPS处理器
|
||||
context = ssl._create_unverified_context()
|
||||
|
||||
# 创建一个opener对象
|
||||
opener = build_opener(HTTPSHandler(context=context))
|
||||
|
||||
# 安装opener
|
||||
install_opener(opener)
|
||||
|
||||
tmp = []
|
||||
|
||||
pages = input("你要下载多少页:")
|
||||
if pages.isdigit():
|
||||
pages = int(pages)
|
||||
else:
|
||||
raise "你输入不是数字!"
|
||||
for i in range(1, pages + 1):
|
||||
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
|
||||
url = f'https://www.bizhihui.com/dongman/{i}/?order=time'
|
||||
req = Request(url, headers={'User-Agent': user_agent})
|
||||
|
||||
content = urlopen(req).read().decode('utf-8')
|
||||
link = re.findall('<img src="(.*?)" alt="(.*?)" width="450" height="285">', content)
|
||||
tmp.extend(link)
|
||||
|
||||
path = os.getcwd() + os.sep + 'images'
|
||||
if not os.path.isdir(path):
|
||||
os.makedirs(path)
|
||||
for i in tmp:
|
||||
jpgname = i[-1]
|
||||
jpgname = jpgname.replace(' ', '_')
|
||||
req = Request(i[0], headers={'User-Agent': user_agent})
|
||||
data = urlopen(req).read()
|
||||
with open(f'{path}{os.sep}{jpgname}.jpg', 'wb') as f:
|
||||
f.write(data)
|
||||
print(jpgname, "已下载")
|
||||
input("下载完成,按回车结束....")
|
||||
```
|
||||
|
||||
|
||||
|
||||
><span style="color: red; background: yellow; padding: 2px 5px; font-size: 22px;">作业9.1提交的内容</span>
|
||||
>
|
||||
>- 理解程序的运行逻辑
|
||||
>- 程序运行成功的截图,单独发送给组长
|
||||
>- 爬虫抓取到的壁纸也要截图发给组长看下
|
Reference in New Issue
Block a user