Ubuntu 下 Python 数据分析环境配置指南
一 基础环境准备
sudo apt update && sudo apt install -y python3 python3-pippython3 -m venv ~/venvs/data310source ~/venvs/data310/bin/activatepip install -U pippip config set global.index-url https://mirrors.aliyun.com/pypi/simple/python -V、pip -V 应显示 Python 3.x 与对应 pip 版本。二 安装常用数据分析库
pip install numpy pandas matplotlib seaborn scipy scikit-learn jupyterpip install reportlabpip install tensorflow(或 pip install torch)python - <<'PY' import sys, numpy, pandas, matplotlib, seaborn, scipy, sklearn, jupyter print("Python:", sys.version) print("NumPy:", numpy.__version__, "Pandas:", pandas.__version__) print("Matplotlib:", matplotlib.__version__, "Seaborn:", seaborn.__version__) print("SciPy:", scipy.__version__, "Scikit-learn:", sklearn.__version__) print("Jupyter:", jupyter.__version__) PY三 使用 Anaconda 的一体化方案(可选)
wget https://repo.anaconda.com/archive/Anaconda3-2024.05-Linux-x86_64.shbash Anaconda3-2024.05-Linux-x86_64.sh,按提示完成安装source ~/.bashrcconda create -n data310 python=3.10 -y,conda activate data310conda install -c conda-forge numpy pandas matplotlib seaborn scipy scikit-learn jupyterconda install -c conda-forge spyder四 快速验证与常见操作
jupyter notebookimport pandas as pddf = pd.read_csv('data.csv')print(df.head())print(df.describe())df.plot(x='date', y='value', kind='line')import matplotlib.pyplot as plt; plt.show()pip install pymysql sqlalchemyfrom sqlalchemy import create_engineengine = create_engine('mysql+pymysql://user:password@host:3306/db')df = pd.read_sql('SELECT * FROM table_name', engine)五 常见问题与优化建议
sudo pip 操作pip cache purge,可显著加速安装conda env export > environment.yml 共享环境--no-browser 启动 Jupyter,配合 SSH 隧道端口转发chunksize)、必要时考虑 Dask 或 Polars