Jump to content



Photo
- - - - -

How to get UTF-8 characters to display

utf-8 da dk iso-8859 windows 1252

This topic has been archived. This means that you cannot reply to this topic.
14 replies to this topic

#1   geoffreywalton

geoffreywalton

    Available for Hire

  • Community Sponsor
  • 8,212 posts

Posted 06 January 2012 - 16:32

Hi

I have converted a db that has 2 languages, Danish and English, from rc2a to 2.3.1 by

1. Exporting the data from an RC2a site

2. Creating a db with

MySQL charset: UTF-8 Unicode (utf8)
MySQL connection collation utf8_general_ci

db collation utf8_general_ci

3. Importing the source file to this db specifying the input file is utf-8

4. Importing a script to alter fields/indexes etc as required.

5. Running a script to correct the data, populating indexes and null fields so it does not break a 2.3.1 site.

6. Exporting the file.

7. Importing the file to a 2.3.1 db.

The site works fine but the Danish description fields display a black diamond with a question mark in it when viewed in FF or a square on ie browsers.

I have tried so many combinations of file formats during the import, export, changed the db charsets/settings but can not get the Danish characters to display.

When the product description record id viewed using phpmy admin the char set shows as utf-8

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
    xml:lang="en-gb"
    lang="en-gb"
    dir="ltr">
<head>
<link rel="icon" href="./favicon.ico" type="image/x-icon" />
<link rel="shortcut icon" href="./favicon.ico" type="image/x-icon" />
<title>phpMyAdmin 3.4.3.1 -
    phpmyadmin.xxxxhost.dk</title>
<meta http-equiv="Content-Type"
    content="text/html; charset=utf-8" />

I have set the charset in the languages file to utf-8 UTF-8 da_DK.utf8

<!DOCTYPE html>
<html dir="ltr" lang="da"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

This is when I was trying upper-case.

The site is

http://shop.fyns-dykkercenter.dk

there is htaccess on the site

User : osc
pw: pepsi

You can see it on this product

http://shop.fyns-dyk...111&language=da

Can anyone suggest a solution?

TIA

G
Need help installing add ons/contributions, cleaning a hacked site or a bespoke development, check my profile

Virus Threat Scanner
My Contributions
Basic install answers.
Click here for Contributions / Add Ons.
UK your site.
Site Move.
Basic design info.

For links mentioned in old answers that are no longer here follow this link Useful Threads.

If this post was useful, click the Like This button over there ======>>>>>.

#2   multimixer

multimixer

    Lemons or Melons ?

  • Partner
  • 4,579 posts

Posted 06 January 2012 - 17:15

Geoffrey, some ideas
- How are the entries in the DB? Correct text or something else?
- Did you try this regarding correct DB entries in urf8?
- Do you enter everything in admin using english? How if you do it in Danish?

#3   GemRock

GemRock
  • Members
  • 2,074 posts

Posted 06 January 2012 - 17:52

utf-8 is one of the most misleading thing in web page design. if you select (browser) Encoding -> Western, the page shows correctly, that would tell you the CHARSET should be iso-8859-1, which is what i used for (also) a denish website a couple of years ago. nothing to do with db or connection/collation etc.
Ken
commercial support - unProtected channel, not to be confused with the forum with same name - open to everyone who need some professional help: either PM/email me, or go to my website (URL can be found in my profile).
over 20 years of computer programming experience.

#4   DunWeb

DunWeb

    The Censored One

  • Members
  • 13,084 posts

Posted 06 January 2012 - 18:06

I have found that UTF-8 doesn't work for many languages other than English. In most cases I change it to iso-8859-1.



Chris
:|: Was this post helpful ? Click the LIKE THIS button :|:

See my Profile to learn more about add ons, templates, support plans and custom coding (click here)

#5   Biancoblu

Biancoblu

    1291 Giger's Alien

  • Community Sponsor
  • 745 posts

Posted 06 January 2012 - 18:55

Geoffrey,

I have a 4 language site and found that the best way to display all languages correctly is this:


set language files to charset=iso-8859-1

MySQL connection collation utf8_general_ci

db collation utf8_general_ci

Use html codes in place of foreign characters, in all the texts and in all the language files. The only places where I use foreign characters "as is" is in the language files that send emails, otherwise people get the html instead of the proper character in the email.

This displays languages correctly wether you view the site with Western ISO-8859-1 or Unicode UTF-8 in the browser, the only slight problem is if you use the USU5Pro rewrite addon, what happens is the category links and category titles show a question mark inside black diamond ONLY if you force the browser to view in Unicode UTF-8, due to the character conversion files ('à' => 'a'). Normally though the browser picks up the page encoding (Western ISO-8859-1) and so displays it correctly.

Hope that helps.

Edited by Biancoblu, 06 January 2012 - 18:56.

~ Don't mistake my kindness for weakness ~

#6   geoffreywalton

geoffreywalton

    Available for Hire

  • Community Sponsor
  • 8,212 posts

Posted 06 January 2012 - 20:33

Wow, go out shopping and get all this.


George:

The characters in the db show as danish so the data is there in the db.


When the product description record id viewed using phpmy admin the char set shows as utf-8


I should have said

When the product description record is viewed using phpmy admin the char set shows as utf-8 and the danish characters are displayed.

Bearing the above in mind, no didn't try those add-ons "yet".

Finally when you edit the description fields using categories.php, the characters show as diamonds but if you change them to the danish character and save it , hey presto they appear in the shop as Danish characters.

Thanks for the input.

Ken

I tried all the encodings on ie8 and FF and nothing displays the characters correctly. Or is it just my PC?

On this producrt

http://shop.fyns-dyk...111&language=da

the character between the chevrons

>>>> p� <<<<

should be pa with a small circle over the a.

Really I am looking for a solution to get what is in the db displayed rather than a workaround, but if changing the encoding in the browser doesn't display it then I am not sure what the next step could be.

Does this shake any more ideas out of the grey cells?

Thanks

G

Edited by geoffreywalton, 06 January 2012 - 20:34.

Need help installing add ons/contributions, cleaning a hacked site or a bespoke development, check my profile

Virus Threat Scanner
My Contributions
Basic install answers.
Click here for Contributions / Add Ons.
UK your site.
Site Move.
Basic design info.

For links mentioned in old answers that are no longer here follow this link Useful Threads.

If this post was useful, click the Like This button over there ======>>>>>.

#7   Biancoblu

Biancoblu

    1291 Giger's Alien

  • Community Sponsor
  • 745 posts

Posted 06 January 2012 - 22:05

From your page info it appears you have a mix of iso-8859-1 and UTF-8.
My browser automatically picks Unicode UTF-8 when I view your site, and I see black diamonds in the body but not in the header nav bar or footer. If I force it to Western ISO 8859-1 it's the opposite, the body has the correct characters but not the header nor the footer. Have tried on FF 9, Chrome, IE 9.

What if you set the DB collation to latin1-swedish-ci and the language files to charset=iso-8859-1?

Also, this addon may help: http://addons.oscomm....com/info/7628/
~ Don't mistake my kindness for weakness ~

#8   geoffreywalton

geoffreywalton

    Available for Hire

  • Community Sponsor
  • 8,212 posts

Posted 06 January 2012 - 22:37

Well spotted.

Now I can change the encoding to iso-8859-1 in FF and it displays the correct character. What is going on? Couldn't before but must have been operator blindness.

In danish.php I have this

// charset for web pages and emails
define('CHARSET', 'ISO-8859-1');

The collation on the db is latin1_swedish_ci

Saved both danish.php and prodinfo.php as western ISO.

Still get the diamond and square in ie.

Also changed the db and desc fields to latin1_swedish_ci, still diamonds.

Where is the next brick wall to bang my head on?

Cheers

G

Edited by geoffreywalton, 06 January 2012 - 22:52.

Need help installing add ons/contributions, cleaning a hacked site or a bespoke development, check my profile

Virus Threat Scanner
My Contributions
Basic install answers.
Click here for Contributions / Add Ons.
UK your site.
Site Move.
Basic design info.

For links mentioned in old answers that are no longer here follow this link Useful Threads.

If this post was useful, click the Like This button over there ======>>>>>.

#9   emikeu

emikeu
  • Members
  • 14 posts

Posted 07 January 2012 - 00:16

  • [img]http://validator.w3.org/images/info_icons/error.png[/img]
    Sorry, I am unable to validate this document because on line 127 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
    The error was: utf8 "\xF8" does not map to Unicode


#10   emikeu

emikeu
  • Members
  • 14 posts

Posted 07 January 2012 - 00:21

The side is so full with faults, please validate. Set the encoding to iso.

#11   emikeu

emikeu
  • Members
  • 14 posts

Posted 07 January 2012 - 00:24

I think before validating, reverse the changes you made after step 7.

#12   Biancoblu

Biancoblu

    1291 Giger's Alien

  • Community Sponsor
  • 745 posts

Posted 07 January 2012 - 09:33


Saved both danish.php and prodinfo.php as western ISO.


Do you mean you saved them as western ISO after editing in notepad?

Personally I save all files in ANSI.

To recap, I have:
// charset for web pages and emails
define('CHARSET', 'ISO-8859-1');

in language files


my DB is:

MySQL connection collation utf8_general_ci
db collation utf8_general_ci
(but I believe it should also work if set at latin1_swedish_ci)

and I save all files in ANSI.


One thing you might look at: where did the danish language pack come from? Have a look how it's encoded. I ask because when I installed my language packs I had huge problems with dispalying the foreign characters, so I had to go through all the files and change the encodings, change from UTF-8 to ISO-8859-1, then save them all in ANSI, then I also replaced all foreign characters with HTML codes.....ie:
&agrave
instead of à.
My DB now contains HTML codes rather than foreign characters "as is".

Edited by Biancoblu, 07 January 2012 - 09:36.

~ Don't mistake my kindness for weakness ~

#13   Xpajun

Xpajun
  • Members
  • 1,310 posts

Posted 07 January 2012 - 12:29

Geoffery,

If you have (as I read it) upgraded a old latin_1 database to utf-8 encoding all you would have changed is the way the new inputted charactors are stored - all existing charactors/data that you have in the database remain as latin_1 encoding... this is why this happens:

Finally when you edit the description fields using categories.php, the characters show as diamonds but if you change them to the danish character and save it , hey presto they appear in the shop as Danish characters.


In other words you have to go through all the bad charactors and change them (don't envy anyone this job on a non-english site)


Finally your database MUST have utf8_general_ci collation NOTlatin1_swedish_ci otherwise you are just making the mess bigger


Julian

#14   Biancoblu

Biancoblu

    1291 Giger's Alien

  • Community Sponsor
  • 745 posts

Posted 07 January 2012 - 13:13

what kind of characters do you have in the DB? diamonds or the proper danish character?

I too upgraded my old shop to 2.3.1, the old DB was latin1_swedish_ci.

I'm nowhere near an expert on this, but speaking from personal experience, if you have black diamonds in the DB you have to manually change them all to enter the correct character, which is a big pain but unavoidable.

If the correct character is already there it's a matter of the danish language pack not being correctly encoded. I had the exact same problem with my italian pack, it had charset UTF-8 and was saved in UTF-8 without BOM which was causing an incredible mess. I had to change all the files and what I posted above is what worked for me, but perhaps there are other/better ways of achieving it.
~ Don't mistake my kindness for weakness ~

#15   techgoddess

techgoddess
  • Members
  • 34 posts

Posted 18 March 2012 - 01:01

Hi Geoffery:

I’m running RC2A but was having the exact same problem with utf-8 showing special characters (Spanish) correctly on one part of the page and but not the other and then when I switched to iso-8859-1 and the part that was showing correctly would now not show correctly and vice versa.

The problem is just like Isabella said. Some of my Spanish pages are straight text with characters written as is and some of my pages correctly have the special characters encoded correctly. It’s probably best to just go through your site and correct pages not coded correctly. This site (which I am in no way affiliated with) or a site like it can help with the task, “http://www.charset.org/html-special-characters.php”.

My fix was to first fix/patch the problem immediately, because I have a live site, by editing “.htaccess”.

1) I opened up .htaccess as the base of the website and added “AddDefaultCharset charset=iso-8859-1” and then saved it.

2) Through PhpMyAdmin, I updated my tables to “utf8_unicode_ci”.

3) I made sure catalog/includes/languages/”language.php” had both “@setlocale(LC_TIME, 'en_US.ISO_8859-1');” and a little further down the page “define('CHARSET', 'ISO_8859-1');” both set to “iso_8859-1”. Verify the same for your “Admin” folder.

4) Go through the process of correcting you language files with the correct encoding, which is what I’m doing now. Because I set my .htaccess file I don’t have to feel hurried about it.

Good luck!

Hi

I have converted a db that has 2 languages, Danish and English, from rc2a to 2.3.1 by

1. Exporting the data from an RC2a site

2. Creating a db with

MySQL charset: UTF-8 Unicode (utf8)
MySQL connection collation utf8_general_ci

db collation utf8_general_ci

3. Importing the source file to this db specifying the input file is utf-8

4. Importing a script to alter fields/indexes etc as required.

5. Running a script to correct the data, populating indexes and null fields so it does not break a 2.3.1 site.

6. Exporting the file.

7. Importing the file to a 2.3.1 db.

The site works fine but the Danish description fields display a black diamond with a question mark in it when viewed in FF or a square on ie browsers.

I have tried so many combinations of file formats during the import, export, changed the db charsets/settings but can not get the Danish characters to display.

When the product description record id viewed using phpmy admin the char set shows as utf-8

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
	xml:lang="en-gb"
	lang="en-gb"
	dir="ltr">
<head>
<link rel="icon" href="./favicon.ico" type="image/x-icon" />
<link rel="shortcut icon" href="./favicon.ico" type="image/x-icon" />
<title>phpMyAdmin 3.4.3.1 -
	phpmyadmin.xxxxhost.dk</title>
<meta http-equiv="Content-Type"
	content="text/html; charset=utf-8" />

I have set the charset in the languages file to utf-8 UTF-8 da_DK.utf8

<!DOCTYPE html>
<html dir="ltr" lang="da"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

This is when I was trying upper-case.

The site is

http://shop.fyns-dykkercenter.dk

there is htaccess on the site

User : osc
pw: pepsi

You can see it on this product

http://shop.fyns-dyk...111&language=da

Can anyone suggest a solution?

TIA

G