Files
python-flask/project09/03.[任务书]爬虫案例.md
2025-09-11 16:52:33 +08:00

62 lines
1.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 爬虫案例
## 需求
- 壁纸获取的网站https://www.bizhihui.com/dongman/1/?order=time
- 网址中的1表示第一页如果改成别的数字就是其他页数
- 写一个程序执行的时候询问需要获取几页的图片然后自动爬取页面上的图片到images文件夹里面
## 参考代码
```python
from urllib.request import urlopen, Request, build_opener, install_opener, HTTPSHandler
import re, os
import ssl
# 创建一个HTTPS处理器
context = ssl._create_unverified_context()
# 创建一个opener对象
opener = build_opener(HTTPSHandler(context=context))
# 安装opener
install_opener(opener)
tmp = []
pages = input("你要下载多少页:")
if pages.isdigit():
pages = int(pages)
else:
raise "你输入不是数字!"
for i in range(1, pages + 1):
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
url = f'https://www.bizhihui.com/dongman/{i}/?order=time'
req = Request(url, headers={'User-Agent': user_agent})
content = urlopen(req).read().decode('utf-8')
link = re.findall('<img src="(.*?)" alt="(.*?)" width="450" height="285">', content)
tmp.extend(link)
path = os.getcwd() + os.sep + 'images'
if not os.path.isdir(path):
os.makedirs(path)
for i in tmp:
jpgname = i[-1]
jpgname = jpgname.replace(' ', '_')
req = Request(i[0], headers={'User-Agent': user_agent})
data = urlopen(req).read()
with open(f'{path}{os.sep}{jpgname}.jpg', 'wb') as f:
f.write(data)
print(jpgname, "已下载")
input("下载完成,按回车结束....")
```
><span style="color: red; background: yellow; padding: 2px 5px; font-size: 22px;">作业9.1提交的内容</span>
>
>- 理解程序的运行逻辑
>- 程序运行成功的截图,单独发送给组长
>- 爬虫抓取到的壁纸也要截图发给组长看下