以下是在Ubuntu上搭建Python爬虫项目的步骤:
环境准备
sudo apt update && sudo apt upgrade -ysudo apt install python3 python3-pippython3 -m venv .venvsource .venv/bin/activate安装爬虫依赖库
pip install requests beautifulsoup4 lxml编写爬虫代码
spider.py),示例代码:import requests
from bs4 import BeautifulSoup
url = 'http://example.com'
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')
print(soup.title.string)
运行爬虫
python spider.py进阶配置(可选)
部署方式
systemd)实现开机自启。注意事项:遵守目标网站robots.txt规则,避免高频请求,优先使用框架(如Scrapy)提升效率。
参考来源: