MYSQL, problem with characters importing csv

Asked

Viewed 1,859 times

1

I am passing the Maxmind csv database to mysql and some accented characters are giving problem. In Latin appears a ? and in UTF nothing appears.

CSV   : Östra Frölunda
UTF8  :
LATIN1: ?stra Fr?lunda

I used the code below that I found on a site to import the whole file:

CREATE TABLE  `location` (
  `locID` int(10) unsigned NOT NULL,
  `country` char(2) default NULL,
  `region` char(2) default NULL,
  `city` varchar(45) default NULL,
  `postalCode` char(7) default NULL,
  `latitude` double default NULL,
  `longitude` double default NULL,
  `dmaCode` char(3) default NULL,
  `areaCode` char(3) default NULL,
  PRIMARY KEY  (`locID`),
  KEY `Index_Country` (`country`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ROW_FORMAT=FIXED;

load data infile 'GeoLiteCity-Location.csv' 
into table `location` 
fields terminated by ',' optionally enclosed by '"' 
lines terminated by '\n' 
ignore 2 lines;

csv has other cities with illegible characters, but these I won’t even care at the moment, but I wanted to care the way it is to then sanitize.

  • How did you determine that the CSV was latin1 to choose the charset? One of the first things would be to correctly evaluate the CSV (even before importing).

  • From what I saw on another site, the file is iso. All examples I saw the charset is latin1... the only one I saw as UTF did the file conversion.

  • 1

    Latin 1 is iso-8859-1 (practically). But without being sure, it gets complicated. The best thing is to look in a code editor to be sure, or even open CSV as text in a browser that has manual coding option for you to test which works. For those who have a hexadecimal editor at hand also just look at the bytes. Without determining the correct charset, everything will be at the base of the kick (and an answer that fits your case may not suit anyone else).

  • Maxmind’s own PHP code that works with . DAT, defines ISO usage: mb_internal_encoding('ISO-8859-1'). So I believe that the other files are iso too, but anyway, then I try to find out more efficiently which charset.

  • 2

    Great chance to be, and in this case even if it is, need to see how it is that you are checking the data on the screen (one thing is to save wrong in DB, another is to view)

  • But if the charset were the same it would not be recorded as csv?

    1. carega the CSV in a text editor 2) checks if the characters are OK 3) if not, modifies the code system (eg. Latin1, UTF...) to have the characters coreto 4) Saved using UTF-8 5) Creates the data table using Charset=utf8_general_ci
  • @Papacharlie Consider commenting on this question the current situation and maybe even contribute an answer if case already solved.

  • 1

    @The Maxmind csv was incomplete and had some errors - in addition to not converting the characters correctly. I decided to leave the free version.

Show 4 more comments
No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.