Personal computing discussed

Moderators: renee, SecretSquirrel, notfred

 
Kreshna Aryaguna Nurzaman
Gerbil XP
Topic Author
Posts: 406
Joined: Mon Apr 11, 2005 2:05 am

Loading text file to MySQL table: bizzare result.

Tue Dec 19, 2006 4:54 am

Well, I have the following text file:
[root@iceheart-rhel4es ~]# cat /home/mysql/pet.txt
Fluffy          Harold          cat     f       1993-02-04      /N
Ringo           Kreshna         dog     m       1995-08-07      2002-12-10
Goatse          Josh            goat    /N      2000-03-11      /N
Fang            Benny           dog     m       1979-08-31      /N
Browser         Firefox         dog     m       1979-08-31      1995-07-29
Chirpy          Gewn            bird    f       1998-09-11      /N
Whistler        Gwen            bird    /N      1997-12-09      /N
Slim            Benny           snake   m       1996-04-29      /N


The file is tab-separated. However, the number of tabs being used varies between columns. For example, there are two tabs between "Fluffy" and "Harold", but there's only one tabe between "cat" and "f".

Then I loaded the text file into a MySQL table named pet:
mysql> desc pet
    -> ;
+---------+-------------+------+-----+---------+-------+
| Field   | Type        | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| name    | varchar(20) | YES  |     | NULL    |       |
| owner   | varchar(20) | YES  |     | NULL    |       |
| species | varchar(20) | YES  |     | NULL    |       |
| sex     | char(1)     | YES  |     | NULL    |       |
| birth   | date        | YES  |     | NULL    |       |
| death   | date        | YES  |     | NULL    |       |
+---------+-------------+------+-----+---------+-------+
6 rows in set (0.00 sec)

mysql> use menagerie
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> load data local infile '/home/mysql/pet.txt' into table pet;
Query OK, 8 rows affected, 24 warnings (0.01 sec)
Records: 8  Deleted: 0  Skipped: 0  Warnings: 24


Judging from above, it seems the loading was successfull.


But when I selected the table, the result is as follows:
mysql> select * from pet;
+----------+-------+---------+------+------------+------------+
| name     | owner | species | sex  | birth      | death      |
+----------+-------+---------+------+------------+------------+
| Fluffy   |       | Harold  |      | 0000-00-00 | 0000-00-00 |
| Ringo    |       | Kreshna |      | 0000-00-00 | 0000-00-00 |
| Goatse   |       | Josh    |      | 0000-00-00 | 0000-00-00 |
| Fang     |       | Benny   |      | 0000-00-00 | 0000-00-00 |
| Browser  |       | Firefox |      | 0000-00-00 | 0000-00-00 |
| Chirpy   |       | Gewn    |      | 0000-00-00 | 0000-00-00 |
| Whistler | Gwen  |         | b    | 0000-00-00 | 1997-12-09 |
| Slim     |       | Benny   |      | 0000-00-00 | 0000-00-00 |
+----------+-------+---------+------+------------+------------+
8 rows in set (0.00 sec)


Well I wonder why. The operating system, by the way, is RHEL 4 ES, not Windows, so I didn't use LINES TERMINATED BY '\r\n'; option, since IIRC it's for Windows text files.

What happens? Why, for instance, the '1997-12-09' became the death date for Whistler, while in the text file it is the birth date?

I suspect this somehow related to the number of tabs I'm using. For instance, I was using two tabs between pet's name and owner's name (for instance, between 'Fluffy' and 'Harold'). Thus, 'Harold' was placed in species column instead of owner.

However, if that's really the case, then I wonder what to do with a column's content that is longer than tab's length? For example, if the owner's name is 'Harold Graham Kennedy', then I will require more than a single tab for the column, am I correct?
 
muyuubyou
Grand Gerbil Poohbah
Posts: 3222
Joined: Wed Aug 28, 2002 6:19 am
Location: London, UK or Tokyo/Yokohama, Japan or Madrid, Spain

Tue Dec 19, 2006 5:21 am

It's the tabs. If you don't specify a FIELDS clause it defaults to:

FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'

Which won't understand multiple tabs. You can also try using some enclosure like " for each field (then escaping any " inside with \).
no sig
 
Kreshna Aryaguna Nurzaman
Gerbil XP
Topic Author
Posts: 406
Joined: Mon Apr 11, 2005 2:05 am

Tue Dec 19, 2006 5:35 am

So, uh, I have enclosed each field with ". The text file is now like this:
[root@iceheart-rhel4es ~]# cat /home/mysql/pet.txt
"Fluffy"                "Harold"                "cat"   "f"     "1993-02-04"
        \N
"Ringo"         "Kreshna"               "dog"   "m"     "1995-08-07"    "2002-12-10"
"Goatse"                "Josh"          "goat"  \N      "2000-03-11"    \N
"Fang"          "Benny"         "dog"   "m"     "1979-08-31"    \N
"Browser"               "Firefox"               "dog"   "m"     "1979-08-31"   "1995-07-29"
"Chirpy"                "Gewn"          "bird"  "f"
        "1998-09-11"    \N
"Whistler"      "Gwen"          "bird"  \N      "1997-12-09"    \N
"Slim"          "Benny"         "snake" "m"     "1996-04-29"    \N


Well the second time I loaded the file, the result was still incorrect:
mysql> load data local infile '/home/mysql/pet.txt' into table pet;
Query OK, 10 rows affected, 29 warnings (0.00 sec)
Records: 10  Deleted: 0  Skipped: 0  Warnings: 29

mysql> select * from pet;
+------------+--------------+-----------+------+------------+------------+
| name       | owner        | species   | sex  | birth      | death      |
+------------+--------------+-----------+------+------------+------------+
| "Fluffy"   |              | "Harold"  |      | 0000-00-00 | 0000-00-00 |
|            | NULL         | NULL      | NULL | NULL       | NULL       |
| "Ringo"    |              | "Kreshna" |      | 0000-00-00 | 0000-00-00 |
| "Goatse"   |              | "Josh"    |      | 0000-00-00 | NULL       |
| "Fang"     |              | "Benny"   |      | 0000-00-00 | 0000-00-00 |
| "Browser"  |              | "Firefox" |      | 0000-00-00 | 0000-00-00 |
| "Chirpy"   |              | "Gewn"    |      | 0000-00-00 | 0000-00-00 |
|            | "1998-09-11" | NULL      | NULL | NULL       | NULL       |
| "Whistler" | "Gwen"       |           | "    | NULL       | 0000-00-00 |
| "Slim"     |              | "Benny"   |      | 0000-00-00 | 0000-00-00 |
+------------+--------------+-----------+------+------------+------------+
10 rows in set (0.00 sec)


Should I put \\ at the end of each row?
 
excession
Graphmaster Gerbil
Posts: 1262
Joined: Fri Dec 31, 2004 3:19 pm
Location: Nottingham, UK

Tue Dec 19, 2006 7:52 am

Why not just have one tab between each entry? :P
i5-4670 | Asus H87M-E | MSI GTX 960 | 8GB DDR3 @800 | WD Green 2TB | 850 EVO 250GB | CM Masterkeys Pro L White | MX518 (original!) | Hyper 212 Evo | 6TB Ubuntu/ZFS NAS
I was going to tell a Chemistry joke, but all the good ones argon.
 
muyuubyou
Grand Gerbil Poohbah
Posts: 3222
Joined: Wed Aug 28, 2002 6:19 am
Location: London, UK or Tokyo/Yokohama, Japan or Madrid, Spain

Tue Dec 19, 2006 9:38 am

Well, it looks like a different problem now. One problem can be the \N value. Shouldn't you escape the \ to \\N?

I'm leaving this link here for reference:
http://dev.mysql.com/doc/refman/5.0/en/load-data.html
no sig
 
Kreshna Aryaguna Nurzaman
Gerbil XP
Topic Author
Posts: 406
Joined: Mon Apr 11, 2005 2:05 am

Tue Dec 19, 2006 12:11 pm

excession wrote:
Why not just have one tab between each entry? :P


Because I anticipated that the data length in a same column will vary. Some will be sufficient with single tab, some others won't.



muyuubyou wrote:
Well, it looks like a different problem now. One problem can be the \N value. Shouldn't you escape the \ to \\N?

I'm leaving this link here for reference:
http://dev.mysql.com/doc/refman/5.0/en/load-data.html


Thanks! Maybe I won't have to punch my own testicles in frustration after the next attempt. :)
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Tue Dec 19, 2006 1:07 pm

If you want to use tab as the field separator character, you shouldn't worry about trying to make the columns line up on the screen. Most programs which read tab-delimited data expect one tab per data field.

If you want the columns in the raw data file to line up on the screen, you should use a fixed format file, and pad fields with spaces.
Nostalgia isn't what it used to be.
 
Kreshna Aryaguna Nurzaman
Gerbil XP
Topic Author
Posts: 406
Joined: Mon Apr 11, 2005 2:05 am

Tue Dec 19, 2006 9:57 pm

just brew it! wrote:
If you want to use tab as the field separator character, you shouldn't worry about trying to make the columns line up on the screen.

I see! So that's my mistake form the very start. Thanks!


just brew it! wrote:
fixed format file, and pad fields with spaces.

Does it need a particular load data option when using this format, or will it be treated just the same as tab-delimited format?
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Tue Dec 19, 2006 10:08 pm

Kreshna Aryaguna Nurzaman wrote:
just brew it! wrote:
fixed format file, and pad fields with spaces.

Does it need a particular load data option when using this format, or will it be treated just the same as tab-delimited format?

The options would certainly be different. When it comes to loading external files I'm more familiar with MS SQL Server's "bcp" tool though, so I don't know the specific MySQL syntax off the top of my head.

Loading a table from a fixed-format text file is a pretty standard database operation, so I'm sure there's a way to do it.
Nostalgia isn't what it used to be.
 
muyuubyou
Grand Gerbil Poohbah
Posts: 3222
Joined: Wed Aug 28, 2002 6:19 am
Location: London, UK or Tokyo/Yokohama, Japan or Madrid, Spain

Wed Dec 20, 2006 3:00 am

I assumed it was a requirement ;) since obviously there are easier ways not using variable-length separators and such.

This is a safe cross-platform format:
LOAD DATA
INFILE 'C:/temp/data.csv'
INTO TABLE AN_RILEGATURE
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '|'
;

Then your file should look like this:
"animal";1;"rabbit"|
"site";2;"Tech Report"|

...for instance. This way you can ignore your OS's carrier return. Can even add multiple records per line if you want.
no sig

Who is online

Users browsing this forum: No registered users and 1 guest
GZIP: On