在 Ubuntu 上使用 MariaDB 进行数据分析的实操指南
一 环境准备与安装
sudo apt update && sudo apt install mariadb-serversudo mysql_secure_installation(设置 root 密码、移除匿名用户、禁止远程 root 登录等)sudo systemctl start mariadb、sudo systemctl enable mariadb、sudo systemctl status mariadbsudo service mysql start|restart|statusmariadb -u root -pSHOW DATABASES;、SELECT VERSION();/etc/mysql/mariadb.conf.d/50-server.cnf:
bind-address = 0.0.0.0port = 3307(示例)sudo service mysql restart)二 数据导入与导出
mysql -u user -p database_name < backup.sqlUSE db; SOURCE /path/file.sql;mysqldump -u user -p database_name > backup.sqlmysqldump -h remote_IP -u user -p -P 3306 database_name > backup.sqlSELECT ... FROM table INTO OUTFILE '/tmp/table.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';LOAD DATA INFILE '/tmp/table.csv' INTO TABLE table FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';secure_file_priv 配置允许该路径docker run -d --name mymariadb -p 3307:3306 \ -e MYSQL_ROOT_PASSWORD=123456 \ -e MARIADB_USER=datalab -e MARIADB_PASSWORD=123456 \ -e MARIADB_DATABASE=testdb \ -v /home/datalab/mariadb_vol/mysql_data:/var/lib/mysql \ -v /home/datalab/mariadb_vol/mysql_conf:/etc/mysql/conf.d \ mariadbdocker exec -it mymariadb mariadb -u root -p三 数据分析常用 SQL 范式
SELECT col1, col2 FROM t WHERE cond;SELECT category, COUNT(*), SUM(amount) FROM t GROUP BY category HAVING SUM(amount) > 1000;ORDER BY col DESC LIMIT 10 OFFSET 20;SELECT DATE_FORMAT(event_time, '%Y-%m') AS ym, COUNT(*) AS cnt FROM t GROUP BY ym;SELECT * FROM t WHERE event_time >= CURDATE() - INTERVAL 30 DAY;SELECT date, amount, SUM(amount) OVER (ORDER BY date) AS running_total FROM sales;SELECT date, category, amount, SUM(amount) OVER (PARTITION BY category ORDER BY date) AS cat_running FROM sales;SELECT date, amount, AVG(amount) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma7 FROM sales;SELECT year, month, revenue, LAG(revenue, 12) OVER (ORDER BY year, month) AS prev_year, ROUND((revenue - LAG(revenue, 12) OVER (ORDER BY year, month)) / LAG(revenue, 12) OVER (ORDER BY year, month), 4) AS yoy FROM sales;SUM(CASE WHEN region='North' THEN amount ELSE 0 END) AS north_salesROW_NUMBER()/RANK()/DENSE_RANK() OVER (PARTITION BY dept ORDER BY score DESC)LEFT JOIN 后再做窗口计算四 与 Python 协同分析
pip install mariadbconn = mariadb.connect(user='datalab', password='123456', host='localhost', port=3306, database='testdb')cursor = conn.cursor()cursor.execute("SELECT id, amount FROM sales WHERE sale_date >= CURDATE() - INTERVAL 7 DAY")rows = cursor.fetchall()data = [(v1, v2), (v3, v4)]cursor.executemany("INSERT INTO sales (col1, col2) VALUES (?, ?)", data)conn.commit()cursor.close(); conn.close()五 性能与运维要点
innodb_buffer_pool_size(如系统内存的 50%–70%)sort_buffer_size、tmp_table_size、max_heap_table_sizemax_connections、innodb_thread_concurrencyEXPLAIN 检查执行计划ANALYZE TABLE table_name; 更新索引统计,提升执行计划质量SELECT 权限;导出/导入使用受限账号与专用目录