温馨提示×

温馨提示×

您好,登录后才能下订单哦!

密码登录×
登录注册×
其他方式登录
点击 登录注册 即表示同意《亿速云用户服务条款》

【MySQL】load data语句详解(二)

发布时间:2020-08-05 07:21:23 来源:ITPUB博客 阅读:777 作者:沃趣科技 栏目:MySQL数据库
作者:罗小波
沃趣科技高级MySQL数据库工程师

1.2.6. FIELDS(与COLUMNS关键字相同)和LINES子句
  • 以下示例中的char代表单个字符,string代表字符串(即多个字符),load data语句中,转义字符和字段引用符只能使用单个字符,字段分隔符、行分隔符、行前缀字符都可以使用多个字符(字符串)
  • 对于LOAD DATA INFILE和SELECT … INTO OUTFILE语句中,FIELDS和LINES子句的语法完全相同。两个子句在LOAD DATA INFILE和SELECT … INTO OUTFILE语句中都是可选的,但如果两个子句都被指定,则FIELDS必须在LINES之前,否则报语法错误

    • FIELDS关键字共有三个子句,TERMINATED BY 'string'指定字段分隔符,[OPTIONALLY] ENCLOSED BY 'char'指定字段引用符(其中使用了OPTIONALLY关键字之后,只在char、varchar和text等字符型字段上加字段引用符,数值型的不会加字段引用符,且OPTIONALLY 关键字只在导出数据时才起作用,导入数据时用于不用对数据没有影响 ),ESCAPED BY 'char'指定转义符,如果您指定了一个FIELDS子句,则它的每个子句也是可选的,但在你指定了FIELDS关键字之后,这个关键字的子句至少需要指定一个,后续章节会进行举例说明
    • LINES关键字共有两个子句,STARTING BY 'string'指定行前缀字符,TERMINATED BY 'string'指定行分隔符(换行符),如果你指定了LINES关键字,则LINES的子句都是可选的,但在你指定了LINES关键字之后,这个关键字的子句至少需要指定一个,后续章节会进行举例说明
    • 如果在导入和导出时没有指定FIELDS和LINES子句,则导入和导出时两个子句的默认值相同,默认的字段分隔符为\t,行分隔符为\n(win上默认为\r\n,记事本程序上默认为\r),字段引用符为空,行前缀字符为空
  • 当mysql server导出文本数据到文件时,FIELDS和LINES默认值时SELECT … INTO OUTFILE在输出文本数据时行为如下:

    • 在文本数据各字段之间使用制表符来作为字段分隔符
    • 不使用任何引号来包围文本数据的各字段值,即字段引用符为空
    • 使用\转义在字段值中出现的制表符\t,换行符\n或转义符本身\等特殊字符(即输出的文本数据中对这些特殊字符前多加一个反斜杠)
    • 在行尾写上换行符\n,即使用\n作为行分隔符(换行符)
    • 注意:如果您在Windows系统上生成了文本文件,则可能必须使用LINES TERMINATED BY '\r\n'来正确读取文件,因为Windows程序通常使用两个字符作为行终止符。某些程序(如写字板)在写入文件时可能会使用\r作为行终止符(要读取这些文件,请使用LINES TERMINATED BY '\r')
    • FIELDS和LINES子句默认值时生成的纯文本数据文件可以使用python代码来读取文件查看文件中的制表符和换行符(linux下的cat和vim等编辑器默认会解析\t为制表符,\n为换行符,所以使用这些命令可能无法看到这些特殊符号)
      1. >>> f = open('/tmp/test3.txt','r')
      2. >>> data = f.readlines()
      3. >>> data
      4. ['2,"a string","100.20"\n', '4,"a string containing a , comma","102.20"\n', '6,"a string containing a \\" quote","102.20"\n', '8,"a string containing a \\", quote and comma","102.20"\n']
      5. >>> for i in data:
      6. ... print i,
      7. ...
      8. 2,"a string","100.20"
      9. 4,"a string containing a , comma","102.20"
      10. 6,"a string containing a \" quote","102.20"
      11. 8,"a string containing a \", quote and comma","102.20"
  • 当mysql server从文本文件读取数据时,FIELDS和LINES默认值会导致LOAD DATA INFILE的行为如下:
    • 寻找换行边界字符\n来进行换行
    • 不跳过行前缀,把行前缀也当作数据(发生在如果导出数据时使用了行前缀,导入时没有指定正确的行前缀或者根本没有指定行前缀选项时)
    • 使用制表符\t来分割一行数据中的各列
    • 要注意:在FIELDS和LINES的默认值下,在解析文本文件时不会把字符串之间的引号当作真正的引号,而是当作数据
1.2.6.1. FIELDS关键字及其子句详解
  • 字段分隔符,默认是\t,使用子句 fields terminated by 'string' 指定,其中string代表指定的字段分隔符
    1. admin@localhost : xiaoboluo 03:08:34> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY ',';
    2. Query OK, 4 rows affected (0.00 sec)
    3. admin@localhost : xiaoboluo 03:08:37> system cat /tmp/test3.txt
    4. 2,a string,100.20
    5. 4,a string containing a \, comma,102.20
    6. 6,a string containing a " quote,102.20
    7. 8,a string containing a "\, quote and comma,102.20
  • 字段引用符,如果加optionally选项则只用在char、varchar和text等字符型字段上,数值类型会忽略使用引用符,如果不指定该子句,则默认不使用引用符,使用子句fields [optionally] enclosed by 'char'指定,其中char代表指定的字段引用符
    1. # 指定字段引用符为",不使用optionally关键字
    2. admin@localhost : xiaoboluo 03:33:33> system rm -f /tmp/test3.txt;
    3. admin@localhost : xiaoboluo 03:37:21> select * from test3 into outfile "/tmp/test3.txt" FIELDS ENCLOSED BY '"';
    4. Query OK, 5 rows affected (0.00 sec)
    5. admin@localhost : xiaoboluo 03:37:33> system cat /tmp/test3.txt
    6. "2" "a string" "100.20"
    7. "4" "a string containing a , comma" "102.20"
    8. "6" "a string containing a \" quote" "102.20"
    9. "8" "a string containing a \", quote and comma" "102.20"
    10. "10" "\\t" "102.20"
    11. # 指定字段引用符为",使用optionally关键字,可以看到id列的字段引用符去掉了
    12. admin@localhost : xiaoboluo 03:37:41> system rm -f /tmp/test3.txt;
    13. admin@localhost : xiaoboluo 03:40:53> select * from test3 into outfile "/tmp/test3.txt" FIELDS optionally ENCLOSED BY '"';
    14. Query OK, 5 rows affected (0.00 sec)
    15. admin@localhost : xiaoboluo 03:41:03> system cat /tmp/test3.txt
    16. 2 "a string" "100.20"
    17. 4 "a string containing a , comma" "102.20"
    18. 6 "a string containing a \" quote" "102.20"
    19. 8 "a string containing a \", quote and comma" "102.20"
    20. 10 "\\t" "102.20
  • 转义字符,默认为\,使用子句fields escaped by 'char' 指定,其中char代表指定的转义字符
    1. admin@localhost : xiaoboluo 03:42:41> system rm -f /tmp/test3.txt;
    2. admin@localhost : xiaoboluo 03:44:18> select * from test3 into outfile "/tmp/test3.txt" fields escaped by '.';
    3. Query OK, 5 rows affected (0.00 sec)
    4. admin@localhost : xiaoboluo 03:44:25> system cat /tmp/test3.txt # 可以看到数据中指定的转义符.号被转义了,而数据\t没有被转义
    5. 2 a string 100..20
    6. 4 a string containing a , comma 102..20
    7. 6 a string containing a " quote 102..20
    8. 8 a string containing a ", quote and comma 102..20
    9. 10 \t 102..20
    10. admin@localhost : xiaoboluo 03:44:28> truncate test3; #清空表
    11. Query OK, 0 rows affected (0.01 sec)
    12. admin@localhost : xiaoboluo 03:45:19> load data infile "/tmp/test3.txt" into table test3 fields escaped by '.'; #导入数据时指定转义符为.
    13. Query OK, 5 rows affected (0.00 sec)
    14. Records: 5 Deleted: 0 Skipped: 0 Warnings: 0
    15. admin@localhost : xiaoboluo 03:45:40> select * from test3; #校验数据,可以看到导入数据正常
    16. +----+------------------------------------------+--------+
    17. | id | test | test2 |
    18. +----+------------------------------------------+--------+
    19. | 2 | a string | 100.20 |
    20. | 4 | a string containing a , comma | 102.20 |
    21. | 6 | a string containing a " quote | 102.20 |
    22. | 8 | a string containing a ", quote and comma | 102.20 |
    23. | 10 | \t | 102.20 |
    24. +----+------------------------------------------+--------+
    25. 5 rows in set (0.00 sec)
1.2.6.2. LINES 关键字及其子句详解
  • 行前缀字符串,使用子句lines starting by 'string' 指定,其中string代表指定的行前缀字符串,行前缀字符串在导出文本数据时使用该子句指定,在导入文本时在一行数据中如果发现了行前缀字符串,则只导入从前缀字符串开始之后的数据部分,前缀字符本身及其之前的数据被忽略掉,如果某行数据不包含行前缀字符串,则整行数据都会被忽略

如果您想要读取的纯文本文件中所有行都有一个您想要忽略的公用前缀,则可以使用LINES STARTING BY'prefix_string'来跳过这个前缀,以及前缀字符前面的任何内容。如果某行数据不包含前缀字符,则跳过整行内容,例

    1. # load data语句如下
    2. admin@localhost : xiaoboluo 03:48:04> system rm -f /tmp/test3.txt;
    3. admin@localhost : xiaoboluo 03:54:54> select * from test3 into outfile "/tmp/test3.txt" LINES STARTING BY 'xxx';
    4. Query OK, 5 rows affected (0.00 sec)
    5. admin@localhost : xiaoboluo 03:55:03> system cat /tmp/test3.txt #可以看到每行数据前面多了个行前缀字符串xxx
    6. xxx2 a string 100.20
    7. xxx4 a string containing a , comma 102.20
    8. xxx6 a string containing a " quote 102.20
    9. xxx8 a string containing a ", quote and comma 102.20
    10. xxx10 \\t 102.20
    11. # 现在,到shell命令行去修改一下,增加两行
    12. admin@localhost : xiaoboluo 03:55:50> system cat /tmp/test3.txt # 最后要加载的纯文本数据内容如下
    13. xxx2 a string 100.20
    14. xxx4 a string containing a , comma 102.20
    15. xxx6 a string containing a " quote 102.20
    16. xxx8 a string containing a ", quote and comma 102.20
    17. xxx10 \\t 102.20
    18. 12 \\t 102.20
    19. dfadsfasxxx14 \\t 102.20
    20. admin@localhost : xiaoboluo 03:59:03> truncate test3; #清空表
    21. Query OK, 0 rows affected (0.01 sec)
    22. admin@localhost : xiaoboluo 03:59:38> load data infile "/tmp/test3.txt" into table test3 LINES STARTING BY 'xxx'; #导入数据,指定行前缀字符为xxx
    23. Query OK, 6 rows affected (0.00 sec)
    24. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    25. admin@localhost : xiaoboluo 03:59:44> select * from test3; #校验表数据,可以看到没有xxx行前缀的行被忽略了,而包含xxx的最后一行,从xxx开始截断,xxx字符本身及其之前的内容被忽略,\
    26. xxx之后的内容被解析为行数据导入了
    27. +----+------------------------------------------+--------+
    28. | id | test | test2 |
    29. +----+------------------------------------------+--------+
    30. | 2 | a string | 100.20 |
    31. | 4 | a string containing a , comma | 102.20 |
    32. | 6 | a string containing a " quote | 102.20 |
    33. | 8 | a string containing a ", quote and comma | 102.20 |
    34. | 10 | \t | 102.20 |
    35. | 14 | \t | 102.20 |
    36. +----+------------------------------------------+--------+
    37. 6 rows in set (0.00 sec)
    38. 行结束符(换行符),linux下默认为\n,使用子句lines terminated by 'string' 指定,其中string代表指定的换行符
    39. # 指定换行符为\r\n导出数据
    40. admin@localhost : xiaoboluo 03:59:49> system rm -f /tmp/test3.txt;
    41. admin@localhost : xiaoboluo 04:02:22> select * from test3 into outfile "/tmp/test3.txt" lines terminated by '\r\n';
    42. Query OK, 6 rows affected (0.00 sec)
    43. # 由于linux的一些命令本身会解析掉这些特殊字符,所以使用python来查看这个文本文件中的换行符,从下面的结果中可以看到,列表的每一个元素代表一行数据,每一个元素的\
    44. 末尾的\r\n就是这行数据的换行符
    45. >>> f = open('/tmp/test3.txt','r')
    46. >>> data = f.readlines()
    47. >>> data
    48. ['2\ta string\t100.20\r\n', '4\ta string containing a , comma\t102.20\r\n', '6\ta string containing a " quote\t102.20\r\n', '8\ta string containing a ", quote and comma\t102.20\r\n', '10\t\\\\t\t102.20\r\n', \
    49. '14\t\\\\t\t102.20\r\n']
    50. >>>
    51. # 现在,把数据重新导入表,从下面的结果中可以看到,导入表中的数据正确
    52. admin@localhost : xiaoboluo 04:02:39> truncate test3;
    53. Query OK, 0 rows affected (0.01 sec)
    54. admin@localhost : xiaoboluo 04:04:55> load data infile "/tmp/test3.txt" into table test3 lines terminated by '\r\n';
    55. Query OK, 6 rows affected (0.00 sec)
    56. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    57. admin@localhost : xiaoboluo 04:05:11> select * from test3;
    58. +----+------------------------------------------+--------+
    59. | id | test | test2 |
    60. +----+------------------------------------------+--------+
    61. | 2 | a string | 100.20 |
    62. | 4 | a string containing a , comma | 102.20 |
    63. | 6 | a string containing a " quote | 102.20 |
    64. | 8 | a string containing a ", quote and comma | 102.20 |
    65. | 10 | \t | 102.20 |
    66. | 14 | \t | 102.20 |
    67. +----+------------------------------------------+--------+
    68. 6 rows in set (0.00 sec)
1.2.6.3. FIELDS和LINES注意事项
  • 众所周知,MySQL中反斜杠是SQL语句中特殊字符的转义字符,因此在sql语句中碰到特殊字符时,您必须指定一个或者两个反斜杠来为特殊字符转义(如在mysql中或者一些其他程序中,\n代表换行符,\t代表制表符,\代表转义符,那么需要使用\t来转义制表符,\n来转义换行符,\来转义转义符本身,这样才能正确写入数据库或者生成导出的数据文本,使用FIELDS ESCAPED BY子句指定转义符

  • 特殊字符列表如

    1. \0 ASCII NUL (X'00') 字符
    2. \b 退格字符
    3. \n 换行符
    4. \r 回车符
    5. \t 制表符
    6. \Z ASCII 26 (Control+Z)
    7. \N NULL值,如果转义符值为空,则会直接导出null字符串作为数据,这在导入时将把null作为数据导入,而不是null符号
  • 如果数据中包含了ENCLOSED BY '"'子句指定字段引用符号,则与字段引用符号相同数据字符也会被自动添加一个反斜杠进行转义(如果转义符指定为空,则可能会导致数据在导入时无法正确解析)。如果数据中包含了FIELDS TERMINATED BY 子句指定的字段分隔符,则以FIELDS ENCLOSED BY子句指定的字段引用符号为准,被引起来的整个部分作为一整列的数据,列值之间的数据包含字段分隔符不会被转义,而是作为数据处理,但数据中包含的字段引用符会被转义(在数据中包含了字段分隔符的情况下,如果字段引用符号没有指定或者指定为空值,则可能在导入数据时无法正确解析)。如果数据中包含了FIELDS ESCAPED BY子句指定的转义符,字段引用符和行分隔符使用默认值,则在数据中的转义符会被转义(只要不为空,则不管字段分隔符和转义字符定义为什么值,都会被转义),默认情况下,不建议随意更改换行符和转义符,除非必须且你需要校验修改之后数据能够正确导入
    1. # 字段引用符为",数据中包含",转义符和换行符保持默认,导入数据时不会有任何问题
    2. admin@localhost : xiaoboluo 09:46:14> select * from test3;
    3. +----+------------------------------------------+--------+
    4. | id | test | test2 |
    5. +----+------------------------------------------+--------+
    6. | 2 | a string | 100.20 |
    7. | 4 | a string containing a , comma | 102.20 |
    8. | 6 | a string containing a " quote | 102.20 |
    9. | 8 | a string containing a ", quote and comma | 102.20 |
    10. +----+------------------------------------------+--------+
    11. 4 rows in set (0.00 sec)
    12. admin@localhost : xiaoboluo 09:46:17> select * from test3 into outfile "/tmp/test3.txt" FIELDS OPTIONALLY enclosed BY '"';
    13. Query OK, 4 rows affected (0.00 sec)
    14. admin@localhost : xiaoboluo 09:46:23> system cat /tmp/test3.txt;
    15. 2 "a string" "100.20"
    16. 4 "a string containing a , comma" "102.20"
    17. 6 "a string containing a \" quote" "102.20"
    18. 8 "a string containing a \", quote and comma" "102.20" # 可以看到与字段引用符相同的符号数据被转义了
    19. admin@localhost : xiaoboluo 09:54:41> truncate test3;
    20. Query OK, 0 rows affected (0.01 sec)
    21. admin@localhost : xiaoboluo 09:58:01> load data infile '/tmp/test3.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
    22. Query OK, 4 rows affected (0.00 sec)
    23. Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
    24. admin@localhost : xiaoboluo 09:58:45> select * from test3;
    25. +----+------------------------------------------+--------+
    26. | id | test | test2 |
    27. +----+------------------------------------------+--------+
    28. | 2 | a string | 100.20 |
    29. | 4 | a string containing a , comma | 102.20 |
    30. | 6 | a string containing a " quote | 102.20 |
    31. | 8 | a string containing a ", quote and comma | 102.20 |
    32. +----+------------------------------------------+--------+
    33. 4 rows in set (0.00 sec)
    34. # 如果字段引用符为",字段分隔符为,且数据中包含字段引用符"和字段分隔符,,转义符和换行符保持默认,这在导入数据时不会有任何问题
    35. admin@localhost : xiaoboluo 09:53:45> select * from test3 into outfile "/tmp/test3.txt" FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
    36. Query OK, 4 rows affected (0.00 sec)
    37. admin@localhost : xiaoboluo 09:54:29> system cat /tmp/test3.txt;
    38. 2,"a string","100.20"
    39. 4,"a string containing a , comma","102.20"
    40. 6,"a string containing a \" quote","102.20"
    41. 8,"a string containing a \", quote and comma","102.20"
    42. admin@localhost : xiaoboluo 09:54:41> truncate test3;
    43. Query OK, 0 rows affected (0.01 sec)
    44. admin@localhost : xiaoboluo 09:58:01> load data infile '/tmp/test3.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
    45. Query OK, 4 rows affected (0.00 sec)
    46. Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
    47. admin@localhost : xiaoboluo 09:58:45> select * from test3;
    48. +----+------------------------------------------+--------+
    49. | id | test | test2 |
    50. +----+------------------------------------------+--------+
    51. | 2 | a string | 100.20 |
    52. | 4 | a string containing a , comma | 102.20 |
    53. | 6 | a string containing a " quote | 102.20 |
    54. | 8 | a string containing a ", quote and comma | 102.20 |
    55. +----+------------------------------------------+--------+
    56. 4 rows in set (0.00 sec)
    57. # 但是,如果在字段引用符为",数据中包含",字段分隔符使用逗号,换行符保持默认的情况下,转义符使用了空串,这会导致在导入数据时,第四行无法正确解析,报错
    58. admin@localhost : xiaoboluo 09:58:01> load data infile '/tmp/test3.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',';
    59. Query OK, 4 rows affected (0.00 sec)
    60. Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
    61. admin@localhost : xiaoboluo 09:58:45> select * from test3;
    62. +----+------------------------------------------+--------+
    63. | id | test | test2 |
    64. +----+------------------------------------------+--------+
    65. | 2 | a string | 100.20 |
    66. | 4 | a string containing a , comma | 102.20 |
    67. | 6 | a string containing a " quote | 102.20 |
    68. | 8 | a string containing a ", quote and comma | 102.20 |
    69. +----+------------------------------------------+--------+
    70. 4 rows in set (0.00 sec)
    71. admin@localhost : xiaoboluo 09:58:49> select * from test3 into outfile "/tmp/test3_test.txt" FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',' escaped by '';
    72. Query OK, 4 rows affected (0.00 sec)
    73. admin@localhost : xiaoboluo 10:00:42> system cat /tmp/test3_test.txt;
    74. 2,"a string","100.20"
    75. 4,"a string containing a , comma","102.20"
    76. 6,"a string containing a " quote","102.20" #关于这一行数据,需要说明一下ENCLOSED BY子句,该子句指定的引用符号从一个FIELDS TERMINATED BY子句指定的分隔符开始,直到碰到下一个\
    77. 分隔符之间且这个分隔符前面一个字符必须是字段引用符号(如果这个分隔符前面一个字符不是字段引用符,则继续往后匹配,如第二行数据),在这之间的内容都会被当作整个列字符串处理,\
    78. 所以这一行数据在导入时不会发生解析错误
    79. 8,"a string containing a ", quote and comma","102.20" #这一行因为无法正确识别的字段结束位置,所以无法导入,报错终止,前面正确的行也被回滚掉(binlog_format=row)
    80. admin@localhost : xiaoboluo 10:00:49> truncate test3;
    81. Query OK, 0 rows affected (0.01 sec)
    82. admin@localhost : xiaoboluo 10:01:03> load data infile '/tmp/test3_test.txt' into table test3 FIELDS OPTIONALLY enclosed BY '"' TERMINATED by ',' escaped by '';
    83. ERROR 1262 (01000): Row 4 was truncated; it contained more data than there were input columns
    84. admin@localhost : xiaoboluo 10:01:33> select * from test3;
    85. Empty set (0.00 sec)
    86. # 数据中包含了默认的转义符和指定的字段分隔符,字段引用符和行分隔符使用默认值,则在数据中的转义符和字段分隔符会被转义(只要不为空,则不管字段分隔符和转义字符定义为什么值,\
    87. 都会被转义)
    88. admin@localhost : xiaoboluo 03:08:45> insert into test3(test,test2) values('\\t','102.20');
    89. Query OK, 1 row affected (0.00 sec)
    90. admin@localhost : xiaoboluo 03:17:29> select * from test3;
    91. +----+------------------------------------------+--------+
    92. | id | test | test2 |
    93. +----+------------------------------------------+--------+
    94. | 2 | a string | 100.20 |
    95. | 4 | a string containing a , comma | 102.20 |
    96. | 6 | a string containing a " quote | 102.20 |
    97. | 8 | a string containing a ", quote and comma | 102.20 |
    98. | 10 | \t | 102.20 |
    99. +----+------------------------------------------+--------+
    100. 5 rows in set (0.00 sec)
    101. admin@localhost : xiaoboluo 03:17:32> system rm -f /tmp/test3.txt;
    102. admin@localhost : xiaoboluo 03:17:39> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY ',';
    103. Query OK, 5 rows affected (0.01 sec)
    104. admin@localhost : xiaoboluo 03:17:42> system cat /tmp/test3.txt
    105. 2,a string,100.20
    106. 4,a string containing a \, comma,102.20
    107. 6,a string containing a " quote,102.20
    108. 8,a string containing a "\, quote and comma,102.20
    109. 10,\\t,102.20
  • 当您使用SELECT … INTO OUTFILE与LOAD DATA INFILE一起将数据从数据库写入文件,然后再将该文件读回数据库时,两个语句的FIELDS和LINES处理选项必须匹配。否则,LOAD DATA INFILE将解析错误的文件内容,示例
    1. # 假设您执行SELECT ... INTO OUTFILE语句时使用了逗号作为列分隔符:
    2. SELECT * INTO OUTFILE 'data.txt'
    3.  FIELDS TERMINATED BY ','
    4.  FROM table2;
    5. # 如果您尝试使用\t作为列分隔符,则它将无法正常工作,因为它会指示LOAD DATA INFILE在字段之间查找制表符,可能导致每个数据行整行解析时被当作单个字段:
    6. LOAD DATA INFILE 'data.txt' INTO TABLE table2
    7.  FIELDS TERMINATED BY '\t';
    8. # 要正确读取逗号分隔各列的文件,正确的语句是
    9. LOAD DATA INFILE 'data.txt' INTO TABLE table2
    10.  FIELDS TERMINATED BY ','
  • 任何FIELDS和LINES处理选项都可以指定一个空字符串(''),但强烈不建议在FIELDS TERMINATED BY、FIELDS ESCAPED BY 和LINES TERMINATED BY子句中使用空串(空格不算空串)作为转义符和换行符,可能导致许多意外的问题,除非你确定使用空串不会出现问题。如果不为空,注意FIELDS [OPTIONALLY] ENCLOSED BY和FIELDS ESCAPED BY子句指定的值只能指定单个字符(即字段引用符号和转义符只能使用单个字符)。但 FIELDS TERMINATED BY, LINES STARTING BY, and LINES TERMINATED BY子句的值可以是多个字符(即字段分隔符和换行符、行前缀字符可以使用多个字符)。例如,指定一个LINES TERMINATED BY'\r\ n'子句,表示指定行换行符为\r\n,这个也是WIN下的换行符
    1. # 如果LINES TERMINATED BY换行符指定了一个空字符,并且FIELDS TERMINATED BY字段分隔符指定的是非空的一个字符(或者使用默认值\t),则行也会以字段分隔符作为行的结束符\
    2. (表现行为就是文本中最后一个字符就是字段分隔符),即整个文本看上去就是一整行数据了
    3. admin@localhost : xiaoboluo 04:48:35> system rm -f /tmp/test3.txt;
    4. admin@localhost : xiaoboluo 04:53:59> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY ',' lines terminated by '';
    5. Query OK, 6 rows affected (0.00 sec)
    6. # 使用python查看文本内容,从下面的结果中可以看到,整个表的数据由于换行符为空,所以导致都拼接为一行了,最后行结束符使用了字段分隔符逗号
    7. >>> f = open('/tmp/test3.txt','r')
    8. >>> data = f.readlines()
    9. >>> data
    10. ['2,a string,100.20,4,a string containing a \\, comma,102.20,6,a string containing a " quote,102.20,8,a string containing a "\\, quote and comma,102.20,10,\\\\t,102.20,14,\\\\t,102.20,']
    11. >>>
    12. # 导入数据到表,这里新建一张表来进行导入测试,预防清理掉了表数据之后,文本内容又无法正确导入的情况发生
    13. admin@localhost : xiaoboluo 04:57:52> create table test4 like test3;
    14. Query OK, 0 rows affected (0.01 sec)
    15. admin@localhost : xiaoboluo 04:57:59> load data infile "/tmp/test3.txt" into table test4 FIELDS TERMINATED BY ',' lines terminated by '';
    16. Query OK, 6 rows affected (0.00 sec)
    17. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    18. admin@localhost : xiaoboluo 04:58:26> select * from test4; #从查询结果上看,数据正确导入表test4中了
    19. +----+------------------------------------------+--------+
    20. | id | test | test2 |
    21. +----+------------------------------------------+--------+
    22. | 2 | a string | 100.20 |
    23. | 4 | a string containing a , comma | 102.20 |
    24. | 6 | a string containing a " quote | 102.20 |
    25. | 8 | a string containing a ", quote and comma | 102.20 |
    26. | 10 | \t | 102.20 |
    27. | 14 | \t | 102.20 |
    28. +----+------------------------------------------+--------+
    29. 6 rows in set (0.00 sec)
    30. # 如果FIELDS TERMINATED BY和FIELDS ENCLOSED BY值都为空(''),则使用固定行(非限制)格式。使用固定行格式时,字段之间使用足够宽的空格来分割各字段。对于数据类型\
    31. 是TINYINT,SMALLINT,MEDIUMINT,INT和BIGINT,字段宽度分别为4,6,8,11和20个空格(无论数据类型声明的显示宽度如何),对于varchar类型使用大约298个空格(这个空格数量是自己\
    32. 数的。。。,猜想这个空格数量可能与字符集,varchar定义长度有关,因为我在尝试把varchar定义为50个字符的时候,空格少了156个左右)
    33. admin@localhost : xiaoboluo 04:58:31> system rm -f /tmp/test3.txt;
    34. admin@localhost : xiaoboluo 05:04:05> select * from test3 into outfile "/tmp/test3.txt" FIELDS TERMINATED BY '' lines terminated by '';
    35. Query OK, 6 rows affected (0.00 sec)
    36. admin@localhost : xiaoboluo 05:04:17> system cat /tmp/test3.txt #下面展示内容中把打断空格使用...代替
    37. 2 a string ... 100.20 ... 4 a string containing a , comma ... 102.20 ... 6 a string containing a " quote ...102.20 ... 8 a string containing a ", quote and comma ... 102.20 \
    38.  ... 10 \\t ... 102.20 ... 14 \\t ... 102.20 ... admin@localhost : xiaoboluo 05:04:35>
    39. # 现在,清理掉test4表,并载入数据,从下面的结果中可以看到,导入表中之后,虽然数据是对的,但是多了非常多的空格,那么也就意味着你需要使用程序正确地处理一下这些多余的空格之后,\
    40. 再执行导入
    41. admin@localhost : xiaoboluo 05:06:19> truncate test4;
    42. Query OK, 0 rows affected (0.01 sec)
    43. admin@localhost : xiaoboluo 05:06:47> load data infile "/tmp/test3.txt" into table test4 FIELDS TERMINATED BY '' lines terminated by ''; # 注意:这是在sql_mode=''时导入的,如果不修改\
    44. sql_mode请使用local关键字
    45. Query OK, 6 rows affected, 12 warnings (0.01 sec)
    46. Records: 6 Deleted: 0 Skipped: 0 Warnings: 12
    47. Note (Code 1265): Data truncated for column 'test' at row 1
    48. Note (Code 1265): Data truncated for column 'test2' at row 1
    49. Note (Code 1265): Data truncated for column 'test' at row 2
    50. Note (Code 1265): Data truncated for column 'test2' at row 2
    51. Note (Code 1265): Data truncated for column 'test' at row 3
    52. Note (Code 1265): Data truncated for column 'test2' at row 3
    53. Note (Code 1265): Data truncated for column 'test' at row 4
    54. Note (Code 1265): Data truncated for column 'test2' at row 4
    55. Note (Code 1265): Data truncated for column 'test' at row 5
    56. Note (Code 1265): Data truncated for column 'test2' at row 5
    57. Note (Code 1265): Data truncated for column 'test' at row 6
    58. Note (Code 1265): Data truncated for column 'test2' at row 6
    59. admin@localhost : xiaoboluo 05:07:09> select * from test4;
    60. +----+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
    61. | id | test | test2 |
    62. +----+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
    63. | 2 | a string | 100.20 |
    64. | 4 | a string containing a , comma | 102.20 |
    65. | 6 | a string containing a " quote | 102.20 |
    66. | 8 | a string containing a ", quote and comma | 102.20 |
    67. | 10 | \t | 102.20 |
    68. | 14 | \t | 102.20 |
    69. +----+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
    70. 6 rows in set (0.00 sec)
  • NULL值的处理根据使用的FIELDS和LINES选项而有所不同
    1. # 对于默认的FIELDS和LINES值,NULL值被转义为\N输出,字段值\N读取时使用NULL替换并输入(假设ESCAPED BY字符为\)
    2. admin@localhost : xiaoboluo 05:17:07> alter table test3 add column test3 varchar(10); #添加一个字段test3,默认值会被填充为null
    3. Query OK, 0 rows affected (0.04 sec)
    4. Records: 0 Duplicates: 0 Warnings: 0
    5. admin@localhost : xiaoboluo 05:17:33> select * from test3; #查看表中的test3列数据
    6. +----+------------------------------------------+--------+-------+
    7. | id | test | test2 | test3 |
    8. +----+------------------------------------------+--------+-------+
    9. | 2 | a string | 100.20 | NULL |
    10. | 4 | a string containing a , comma | 102.20 | NULL |
    11. | 6 | a string containing a " quote | 102.20 | NULL |
    12. | 8 | a string containing a ", quote and comma | 102.20 | NULL |
    13. | 10 | \t | 102.20 | NULL |
    14. | 14 | \t | 102.20 | NULL |
    15. +----+------------------------------------------+--------+-------+
    16. 6 rows in set (0.00 sec)
    17. admin@localhost : xiaoboluo 05:17:37> select * from test3 into outfile "/tmp/test3.txt"; #执行导出
    18. Query OK, 6 rows affected (0.00 sec)
    19. admin@localhost : xiaoboluo 05:18:02> system cat /tmp/test3.txt #查看导出的文本文件,可以发现null被转义为\N了,这是为了避免数据字符串本身包含null值时无法正确区分数据类型的null值
    20. 2 a string 100.20 \N
    21. 4 a string containing a , comma 102.20 \N
    22. 6 a string containing a " quote 102.20 \N
    23. 8 a string containing a ", quote and comma 102.20 \N
    24. 10 \\t 102.20 \N
    25. 14 \\t 102.20 \N
    26. # 导入数据,从结果中可以看到\N被正确解析为了数据类型的null值
    27. admin@localhost : xiaoboluo 05:18:06> truncate test3;
    28. Query OK, 0 rows affected (0.01 sec)
    29. admin@localhost : xiaoboluo 05:20:36> load data infile '/tmp/test3.txt' into table test3;
    30. Query OK, 6 rows affected (0.01 sec)
    31. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    32. admin@localhost : xiaoboluo 05:20:52> select * from test3;
    33. +----+------------------------------------------+--------+-------+
    34. | id | test | test2 | test3 |
    35. +----+------------------------------------------+--------+-------+
    36. | 2 | a string | 100.20 | NULL |
    37. | 4 | a string containing a , comma | 102.20 | NULL |
    38. | 6 | a string containing a " quote | 102.20 | NULL |
    39. | 8 | a string containing a ", quote and comma | 102.20 | NULL |
    40. | 10 | \t | 102.20 | NULL |
    41. | 14 | \t | 102.20 | NULL |
    42. +----+------------------------------------------+--------+-------+
    43. 6 rows in set (0.00 sec)
    44. # 如果FIELDS ENCLOSED BY不为空,FIELDS escaped BY为空时,则将NULL值的字面字符串作为输出字符值。这与FIELDS ENCLOSED BY引用的字符串中包含的null值不同,\
    45. 后者读取为字符串'null',而前者读取到数据库中时被当作数据类型的null值,而不是数据的字符串null
    46. admin@localhost : xiaoboluo 05:20:57> update test3 set test3='null' where id=2; #更新id=2的test3列值为数据字符串的null
    47. Query OK, 1 row affected (0.00 sec)
    48. Rows matched: 1 Changed: 1 Warnings: 0
    49. admin@localhost : xiaoboluo 05:23:14> select * from test3;
    50. +----+------------------------------------------+--------+-------+
    51. | id | test | test2 | test3 |
    52. +----+------------------------------------------+--------+-------+
    53. | 2 | a string | 100.20 | null |
    54. | 4 | a string containing a , comma | 102.20 | NULL |
    55. | 6 | a string containing a " quote | 102.20 | NULL |
    56. | 8 | a string containing a ", quote and comma | 102.20 | NULL |
    57. | 10 | \t | 102.20 | NULL |
    58. | 14 | \t | 102.20 | NULL |
    59. +----+------------------------------------------+--------+-------+
    60. 6 rows in set (0.00 sec)
    61. admin@localhost : xiaoboluo 05:23:16> system rm -f /tmp/test3.txt;
    62. admin@localhost : xiaoboluo 05:24:05> select * from test3 into outfile "/tmp/test3.txt" FIELDS ENCLOSED BY '"' escaped BY ''; #指定行引用符号为双引号",转义符为空导出数据
    63. Query OK, 6 rows affected (0.00 sec)
    64. admin@localhost : xiaoboluo 05:24:51> system cat /tmp/test3.txt #查看导出的文本文件,可以看到数据字符串的null被加了双引号,而数据类型的null没有加双引号
    65. "2" "a string" "100.20" "null"
    66. "4" "a string containing a , comma" "102.20" NULL
    67. "6" "a string containing a " quote" "102.20" NULL
    68. "8" "a string containing a ", quote and comma" "102.20" NULL
    69. "10" "\t" "102.20" NULL
    70. "14" "\t" "102.20" NULL
    71. admin@localhost : xiaoboluo 05:24:57> alter table test4 add column test3 varchar(10);
    72. Query OK, 0 rows affected (0.04 sec)
    73. Records: 0 Duplicates: 0 Warnings: 0
    74. admin@localhost : xiaoboluo 05:26:40> truncate test4; #这里使用test4表做测试,避免无法导入的情况发生
    75. Query OK, 0 rows affected (0.00 sec)
    76. admin@localhost : xiaoboluo 05:26:44> load data infile '/tmp/test3.txt' into table test4 FIELDS ENCLOSED BY '"' escaped BY ''; #指定字段引用符为双引号",转义符为空导入数据
    77. Query OK, 6 rows affected (0.00 sec)
    78. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    79. admin@localhost : xiaoboluo 05:27:02> select * from test4; #查看表test4中的数据,从结果中可以看到,数据导入正确
    80. +----+------------------------------------------+--------+-------+
    81. | id | test | test2 | test3 |
    82. +----+------------------------------------------+--------+-------+
    83. | 2 | a string | 100.20 | null |
    84. | 4 | a string containing a , comma | 102.20 | NULL |
    85. | 6 | a string containing a " quote | 102.20 | NULL |
    86. | 8 | a string containing a ", quote and comma | 102.20 | NULL |
    87. | 10 | \t | 102.20 | NULL |
    88. | 14 | \t | 102.20 | NULL |
    89. +----+------------------------------------------+--------+-------+
    90. 6 rows in set (0.00 sec)
    91. # 使用固定行格式(当FIELDS TERMINATED BY和FIELDS ENCLOSED BY都为空时使用),将NULL写为空字符串。这将导致表中的NULL值和空字符串在写入文件时无法区分,\
    92. 因为它们都以空字符串形式写入文本文件。如果您需要能够在读取文件时将其分开,则不应使用固定行格式(即不应该使用FIELDS TERMINATED BY和FIELDS ENCLOSED BY都为空)
    93. admin@localhost : xiaoboluo 05:29:11> system rm -f /tmp/test3.txt;
    94. admin@localhost : xiaoboluo 05:29:22> select * from test3 into outfile "/tmp/test3.txt" FIELDS ENCLOSED BY '' TERMINATED BY '';
    95. Query OK, 6 rows affected (0.00 sec)
    96. admin@localhost : xiaoboluo 05:29:43> system cat /tmp/test3.txt #从结果中看,是不是有点似曾相识呢?没错,前面演示过FIELDS TERMINATED BY和FIELDS ENCLOSED BY都为空的情况,\
    97. 使用了固定格式来导出文本,但是这里多了数据类型的null值处理,从下面的结果中已经看不到数据类型的null了,被转换为了空值(下面展示时把大段空格使用...代替)
    98. 2 a string ... 100.20 ... null
    99. 4 a string containing a , comma ... 102.20 ...
    100. 6 a string containing a " quote ... 102.20 ...
    101. 8 a string containing a ", quote and comma ... 102.20 ...
    102. 10 \\t ... 102.20 ...
    103. 14 \\t ... 102.20 ...
    104. admin@localhost : xiaoboluo 05:29:46> truncate test4; #清空test4
    105. Query OK, 0 rows affected (0.01 sec)
    106. admin@localhost : xiaoboluo 05:34:15> load data infile "/tmp/test3.txt" into table test4 FIELDS ENCLOSED BY '' TERMINATED BY ''; #执行导入文本到test4表,注意:这是在sql_mode=''时导入的,\
    107. 如果不修改sql_mode请使用local关键字
    108. Query OK, 6 rows affected, 24 warnings (0.01 sec)
    109. Records: 6 Deleted: 0 Skipped: 0 Warnings: 24
    110. Note (Code 1265): Data truncated for column 'test' at row 1
    111. Note (Code 1265): Data truncated for column 'test2' at row 1
    112. Note (Code 1265): Data truncated for column 'test3' at row 1
    113. Warning (Code 1262): Row 1 was truncated; it contained more data than there were input columns
    114. Note (Code 1265): Data truncated for column 'test' at row 2
    115. Note (Code 1265): Data truncated for column 'test2' at row 2
    116. Note (Code 1265): Data truncated for column 'test3' at row 2
    117. Warning (Code 1262): Row 2 was truncated; it contained more data than there were input columns
    118. Note (Code 1265): Data truncated for column 'test' at row 3
    119. Note (Code 1265): Data truncated for column 'test2' at row 3
    120. Note (Code 1265): Data truncated for column 'test3' at row 3
    121. Warning (Code 1262): Row 3 was truncated; it contained more data than there were input columns
    122. Note (Code 1265): Data truncated for column 'test' at row 4
    123. Note (Code 1265): Data truncated for column 'test2' at row 4
    124. Note (Code 1265): Data truncated for column 'test3' at row 4
    125. Warning (Code 1262): Row 4 was truncated; it contained more data than there were input columns
    126. Note (Code 1265): Data truncated for column 'test' at row 5
    127. Note (Code 1265): Data truncated for column 'test2' at row 5
    128. Note (Code 1265): Data truncated for column 'test3' at row 5
    129. Warning (Code 1262): Row 5 was truncated; it contained more data than there were input columns
    130. Note (Code 1265): Data truncated for column 'test' at row 6
    131. Note (Code 1265): Data truncated for column 'test2' at row 6
    132. Note (Code 1265): Data truncated for column 'test3
  • load data执行时如果表中有外键、辅助索引、唯一索引,那么会导致加载数据的时间变慢,因为索引也需要一同更新,可以使用对应参数关闭外键检查、唯一索引检查甚至关闭索引

    • 要在加载操作期间忽略外键约束,可以在执行load data语句之前执行SET foreign_key_checks = 0语句,执行完毕之后执行SET foreign_key_checks = 1或断开会话重连
    • 要在加载操作期间忽略唯一索引约束,可以在执行load data语句之前执行set unique_checks=0语句,执行完毕之后执行set unique_checks=1或断开会话重连
    • 在某些极端情况下(比如表中索引过多),您可以在执行load data语句之前通过执行ALTER TABLE … DISABLE KEYS语句关闭创建索引,在执行完load data语句之后执行ALTER TABLE … ENABLE KEYS来重新创建索引,注意该语句不能关闭主键索引
  • 如果在sql_mode设置为严格模式下,且不使用local和ignore关键字时,碰到缺少字段值会直接报错终止,但在sql_mode设置为严格模式下,使用了local和ignore关键字时,则行为与不使用严格模式类似
    LOAD DATA INFILE将所有输入视为字符串,因此您不能认为load data语句会像INSERT语句那样插入ENUM或SET列的数值。所有ENUM和SET值必须指定为字符串

  • LOAD DATA INFILE不支持的场景

    • 固定大小的行(FIELDS TERMINATED BY和FIELDS ENCLOSED BY都为空)不支持BLOB或TEXT列
    • 如果FIELDS TERMINATED BY和LINES STARTING BY指定相同的字符,则LOAD DATA INFILE无法正确解析
    • 如果FIELDS ESCAPED BY为空,则字段中包含了FIELDS ENCLOSED BY或LINES TERMINATED BY或FIELDS TERMINATED BY的字符时会导致LOAD DATA INFILE语句拒绝读取字段并报错。这是因为LOAD DATA INFILE无法正确确定字段或行在哪里结束
  • PS:在Unix上,如果需要LOAD DATA从管道读取数据,可以使用以下方法(该示例将/目录的列表加载到表db1.t1中,find命令挂后台持续查找内容并生成ls.dat文件,mysql 客户端使用-e选项来执行load data这个文件到表):

    1. mkfifo /mysql/data/db1/ls.dat
    2. chmod 666 /mysql/data/db1/ls.dat
    3. find / -ls> /mysql/data/db1/ls.dat&
    4. mysql -e "LOAD DATA INFILE 'ls.dat' INTO TABLE t1" db1
1.2.7. IGNORE number {LINES | ROWS}子句
  • 忽略输入文件中的前number行数据,使用子句ignore number lines指定忽略文本的前number行,在某些情况下生成的文本(如:mysql -e "select …." > xx.txt中)带有字段名称,在导入时会把这一行字段名称也当作数据,所以需要忽略掉这行字段名称
    1. admin@localhost : xiaoboluo 05:34:41> system cat /tmp/test3.txt
    2. id test test2 test3
    3. 2 a string 100.20 null
    4. 4 a string containing a , comma 102.20 NULL
    5. 6 a string containing a " quote 102.20 NULL
    6. 8 a string containing a ", quote and comma 102.20 NULL
    7. 10 \\t 102.20 NULL
    8. 14 \\t 102.20 NULL
    9. admin@localhost : xiaoboluo 05:41:35> truncate test4;
    10. Query OK, 0 rows affected (0.01 sec)
    11. admin@localhost : xiaoboluo 05:41:41> load data infile "/tmp/test3.txt" into table test4 ignore 1 lines; #载入文本时指定ignore 1 lines子句忽略文本中的前1行数据
    12. Query OK, 6 rows affected (0.00 sec)
    13. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    14. admin@localhost : xiaoboluo 05:42:22> select * from test4; #查询表test4中的数据,从下面的结果中可以看到数据正确
    15. +----+------------------------------------------+--------+-------+
    16. | id | test | test2 | test3 |
    17. +----+------------------------------------------+--------+-------+
    18. | 2 | a string | 100.20 | null |
    19. | 4 | a string containing a , comma | 102.20 | NULL |
    20. | 6 | a string containing a " quote | 102.20 | NULL |
    21. | 8 | a string containing a ", quote and comma | 102.20 | NULL |
    22. | 10 | \t | 102.20 | NULL |
    23. | 14 | \t | 102.20 | NULL |
    24. +----+------------------------------------------+--------+-------+
    25. 6 rows in set (0.00 sec)
  • LOAD DATA INFILE可用于读取外部数据源文件。例如,许多程序可以以逗号分隔的值(CSV)格式导出数据,字段用逗号分隔,并包含在双引号内,并带有一个字段列名的初始行。如果这样一个文件中的数据行的换行符再使用回车符,则load data语句可以这样编写:
    1. LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
    2.  FIELDS TERMINATED BY ',' ENCLOSED BY '"'
    3.  LINES TERMINATED BY '\r\n'
    4.  IGNORE 1 LINES;
    5. # 如果输入值不一定包含在引号内,请在ENCLOSED BY关键字之前使用OPTIONALLY,如:OPTIONALLY ENCLOSED BY '"',加上OPTIONALLY 可能会忽略数值类型的字段的引用符号,\
    6. 另外,如果你的csv文件第一行是数据而不是列名,那就不能使用IGNORE 1 LINES子句
1.2.8. (col_name_or_user_var,…)指定字段名称的子句
  • 默认情况下,如果使用load data语句时表名后边不带字段,那么会把整个表的字段数据都导入到数据库中,如:LOAD DATA INFILE'persondata.txt'INTO TABLE persondata;
  • 如果只想加载某些列,请指定列列表,如:LOAD DATA INFILE'persondata.txt'INTO TABLE persondata(col1,col2,…); ,要注意:如果输入文件中的字段值顺序与表中列的顺序不同,你需要对load data语句中的tb_name后跟的字段顺序做一下调整以对应文本文件中的字段顺序。否则,MySQL不能判断如何与表中的顺序对齐,列出列名时可以在tb_name后指定具体的列名,也可以使用表达式生成值指定给某个列名(使用set语句指定一个表达式,复制给一个变量,详见1.2.9小节),如果没有set语句,建议列名写在tb_name表名后边,方便理解,有set语句时就跟set语句写在一起

    • 如果发现文件中的列顺序和表中的列顺序不符,或者只想加载部分列,在命令中加上列的顺序时指定的字段名也不一定非要放在紧跟着表名,可以放在语句最后面也可以,如:load data infile "/tmp/filename.txt" into table emp fields terminated by ',' enclosed by '"' ignore 2 lines (id,content,name);如果只需要导入一个字段id,则把 (id,content,name)换做(id)即可

    • 使用示例参考1.1小节的“如果文本文件中的数据字段与表结构中的字段定义顺序不同,则使用如下语句指定载入表中的字段顺序”演示部分

1.2.8. SET col_name = expr,…子句
  • 将列做一定的数值转换后再加载,使用子句set col_name = expr,.. 指定,要注意:col_name必须为表中真实的列名,expr可以是任意的表达式或者子查询,只要返回的数据结果值能对应上表中的字段数据定义类型即可,注意,非set语句生成的列名,必须使用括号括起来,否则报语法错误。
    1. # 如果系统将id列的文本数据加上10以后再加载到表的test3列中,可以如下操作:
    2. admin@localhost : xiaoboluo 06:05:42> system rm -f /tmp/test3.txt;
    3. admin@localhost : xiaoboluo 06:06:00> select * from test3 into outfile "/tmp/test3.txt";
    4. Query OK, 6 rows affected (0.00 sec)
    5. admin@localhost : xiaoboluo 06:06:04> system cat /tmp/test3.txt
    6. 2 a string 100.20 null
    7. 4 a string containing a , comma 102.20 \N
    8. 6 a string containing a " quote 102.20 \N
    9. 8 a string containing a ", quote and comma 102.20 \N
    10. 10 \\t 102.20 \N
    11. 14 \\t 102.20 \N
    12. admin@localhost : xiaoboluo 06:07:49> truncate test4;
    13. Query OK, 0 rows affected (0.01 sec)
    14. admin@localhost : xiaoboluo 06:07:53> load data infile "/tmp/test3.txt" into table test4 (id,test,test2) set test3=id+10 ;
    15. ERROR 1262 (01000): Row 1 was truncated; it contained more data than there were input columns
    16. admin@localhost : xiaoboluo 06:08:02> select * from test4; #严格模式下因为文本中多了一个字段被截断了,所以拒绝导入
    17. Empty set (0.00 sec)
    18. admin@localhost : xiaoboluo 06:08:08> load data local infile "/tmp/test3.txt" into table test4 (id,test,test2) set test3=id+10 ; #可以使用local关键字强制进行截断最后一个字段的null值列进行导入,\
    19. 注意,如果不使用local关键字,那就需要修改sql_mode才能导入
    20. Query OK, 6 rows affected, 6 warnings (0.01 sec)
    21. Records: 6 Deleted: 0 Skipped: 0 Warnings: 6
    22. Warning (Code 1262): Row 1 was truncated; it contained more data than there were input columns
    23. Warning (Code 1262): Row 2 was truncated; it contained more data than there were input columns
    24. Warning (Code 1262): Row 3 was truncated; it contained more data than there were input columns
    25. Warning (Code 1262): Row 4 was truncated; it contained more data than there were input columns
    26. Warning (Code 1262): Row 5 was truncated; it contained more data than there were input columns
    27. Warning (Code 1262): Row 6 was truncated; it contained more data than there were input columns
    28. admin@localhost : xiaoboluo 06:10:45> select * from test4;
    29. +----+------------------------------------------+--------+-------+
    30. | id | test | test2 | test3 |
    31. +----+------------------------------------------+--------+-------+
    32. | 2 | a string | 100.20 | 12 |
    33. | 4 | a string containing a , comma | 102.20 | 14 |
    34. | 6 | a string containing a " quote | 102.20 | 16 |
    35. | 8 | a string containing a ", quote and comma | 102.20 | 18 |
    36. | 10 | \t | 102.20 | 20 |
    37. | 14 | \t | 102.20 | 24 |
    38. +----+------------------------------------------+--------+-------+
    39. 6 rows in set (0.00 sec)
    40. # 或者使用txt文件中的某些列进行计算后生成新的列插入,这里演示两个字段进行相加后导入另外一个字段中:
    41. admin@localhost : xiaoboluo 06:18:37> load data local infile "/tmp/test3.txt" into table test4 (id,test,test2) set test3=id+test2 ; # 注意,如果不使用local关键字,那就需要修改sql_mode才能导入
    42. Query OK, 6 rows affected, 6 warnings (0.00 sec)
    43. Records: 6 Deleted: 0 Skipped: 0 Warnings: 6
    44. Warning (Code 1262): Row 1 was truncated; it contained more data than there were input columns
    45. Warning (Code 1262): Row 2 was truncated; it contained more data than there were input columns
    46. Warning (Code 1262): Row 3 was truncated; it contained more data than there were input columns
    47. Warning (Code 1262): Row 4 was truncated; it contained more data than there were input columns
    48. Warning (Code 1262): Row 5 was truncated; it contained more data than there were input columns
    49. Warning (Code 1262): Row 6 was truncated; it contained more data than there were input columns
    50. admin@localhost : xiaoboluo 06:19:07> select * from test4;
    51. +----+------------------------------------------+--------+-------+
    52. | id | test | test2 | test3 |
    53. +----+------------------------------------------+--------+-------+
    54. | 2 | a string | 100.20 | 102.2 |
    55. | 4 | a string containing a , comma | 102.20 | 106.2 |
    56. | 6 | a string containing a " quote | 102.20 | 108.2 |
    57. | 8 | a string containing a ", quote and comma | 102.20 | 110.2 |
    58. | 10 | \t | 102.20 | 112.2 |
    59. | 14 | \t | 102.20 | 116.2 |
    60. +----+------------------------------------------+--------+-------+
    61. 6 rows in set (0.00 sec)
  • SET子句中使用用户变量,用户变量可以以多种方式使用
    1. # 可以直接使用一个用户变量并进行计算(计算表达式可以使用函数、运算符、子查询等都允许),然后赋值给test4列直接导入,而不需要从文件中读取test4列数据,该列数据也允许在文件中不存在
    2. admin@localhost : xiaoboluo 06:27:06> alter table test4 add column test4 varchar(20); #新建一个字段test4,用于导入set子句计算的值
    3. Query OK, 0 rows affected (0.01 sec)
    4. Records: 0 Duplicates: 0 Warnings: 0
    5. admin@localhost : xiaoboluo 06:27:56> truncate test4;
    6. Query OK, 0 rows affected (0.01 sec)
    7. admin@localhost : xiaoboluo 06:28:02> set @test=200; #设置一个用户变量
    8. Query OK, 0 rows affected (0.00 sec)
    9. admin@localhost : xiaoboluo 06:30:32> load data infile "/tmp/test3.txt" into table test4 (id,test,test2,test3) set test4=round(@test/100,0) ; #执行导入,使用set子句导入test4列通过表达式\
    10. round(@test/100,0)计算之后的值
    11. Query OK, 6 rows affected (0.00 sec)
    12. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    13. admin@localhost : xiaoboluo 06:30:52> select * from test4; #查看test4表中导入的数据,从以下结果中来看,导入数据正确
    14. +----+------------------------------------------+--------+-------+-------+
    15. | id | test | test2 | test3 | test4 |
    16. +----+------------------------------------------+--------+-------+-------+
    17. | 2 | a string | 100.20 | null | 2 |
    18. | 4 | a string containing a , comma | 102.20 | NULL | 2 |
    19. | 6 | a string containing a " quote | 102.20 | NULL | 2 |
    20. | 8 | a string containing a ", quote and comma | 102.20 | NULL | 2 |
    21. | 10 | \t | 102.20 | NULL | 2 |
    22. | 14 | \t | 102.20 | NULL | 2 |
    23. +----+------------------------------------------+--------+-------+-------+
    24. 6 rows in set (0.00 sec)
    25. # SET子句可以将一个内部函数返回的值直接导入到一个指定列
    26. admin@localhost : xiaoboluo 06:31:22> truncate test4;
    27. Query OK, 0 rows affected (0.01 sec)
    28. admin@localhost : xiaoboluo 06:40:58> load data infile "/tmp/test3.txt" into table test4 (id,test,test2,test3) set test4=now() ;
    29. Query OK, 6 rows affected (0.00 sec)
    30. Records: 6 Deleted: 0 Skipped: 0 Warnings: 0
    31. admin@localhost : xiaoboluo 06:41:02> select * from test4;
    32. +----+------------------------------------------+--------+-------+---------------------+
    33. | id | test | test2 | test3 | test4 |
    34. +----+------------------------------------------+--------+-------+---------------------+
    35. | 2 | a string | 100.20 | null | 2017-05-03 18:41:02 |
    36. | 4 | a string containing a , comma | 102.20 | NULL | 2017-05-03 18:41:02 |
    37. | 6 | a string containing a " quote | 102.20 | NULL | 2017-05-03 18:41:02 |
    38. | 8 | a string containing a ", quote and comma | 102.20 | NULL | 2017-05-03 18:41:02 |
    39. | 10 | \t | 102.20 | NULL | 2017-05-03 18:41:02 |
    40. | 14 | \t | 102.20 | NULL | 2017-05-03 18:41:02 |
    41. +----+------------------------------------------+--------+-------+---------------------+
    42. 6 rows in set (0.00 sec)
  • 使用指定列名或者变量列表时SET子句的使用受以下限制:

    • SET子句中的赋值表达式赋值运算符的左侧只能使用数据库表中的真实列名
    • 您可以在SET子句中的右侧使用子查询。返回要分配给列的值的子查询可能仅是标量子查询。此外,在这个子查询中您不能使用load data语句正在操作的表
    • SET子句不会处理IGNORE子句忽略的行。
    • 用固定行格式加载数据时,不能使用用户变量,因为用户变量值之间没有显示宽度
  • 如果输入行的字段太多(多过表中的字段数量),则会忽略额外的字段,并增加警告数。如果输入行的字段太少,那么输入字段缺少的表列被设置为其默认值,在解析文本文件时,空串字段数据与缺少字段值不同(空串会直接作为数据插入,而缺少字段时,会根据字段定义的默认值进行填充),如下:

    • 对于字符串类型,列设置为空字符串
    • 对于数字类型,列设置为0
    • 对于日期和时间类型,列将该类型设置为适当的“零”值
向AI问一下细节

免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。

AI