Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

How to get UTF-8 characters to display


geoffreywalton

Recommended Posts

Hi

 

I have converted a db that has 2 languages, Danish and English, from rc2a to 2.3.1 by

 

1. Exporting the data from an RC2a site

 

2. Creating a db with

 

MySQL charset: UTF-8 Unicode (utf8)

MySQL connection collation utf8_general_ci

 

db collation utf8_general_ci

 

3. Importing the source file to this db specifying the input file is utf-8

 

4. Importing a script to alter fields/indexes etc as required.

 

5. Running a script to correct the data, populating indexes and null fields so it does not break a 2.3.1 site.

 

6. Exporting the file.

 

7. Importing the file to a 2.3.1 db.

 

The site works fine but the Danish description fields display a black diamond with a question mark in it when viewed in FF or a square on ie browsers.

 

I have tried so many combinations of file formats during the import, export, changed the db charsets/settings but can not get the Danish characters to display.

 

When the product description record id viewed using phpmy admin the char set shows as utf-8

 

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
   xml:lang="en-gb"
   lang="en-gb"
   dir="ltr">
<head>
<link rel="icon" href="./favicon.ico" type="image/x-icon" />
<link rel="shortcut icon" href="./favicon.ico" type="image/x-icon" />
<title>phpMyAdmin 3.4.3.1 -
   phpmyadmin.xxxxhost.dk</title>
<meta http-equiv="Content-Type"
   content="text/html; charset=utf-8" />

 

I have set the charset in the languages file to utf-8 UTF-8 da_DK.utf8

 

<!DOCTYPE html>
<html dir="ltr" lang="da"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

 

This is when I was trying upper-case.

 

The site is

 

http://shop.fyns-dykkercenter.dk

 

there is htaccess on the site

 

User : osc

pw: pepsi

 

You can see it on this product

 

http://shop.fyns-dykkercenter.dk/product_info.php?products_id=111&language=da

 

Can anyone suggest a solution?

 

TIA

 

G

Need help installing add ons/contributions, cleaning a hacked site or a bespoke development, check my profile

 

Virus Threat Scanner

My Contributions

Basic install answers.

Click here for Contributions / Add Ons.

UK your site.

Site Move.

Basic design info.

 

For links mentioned in old answers that are no longer here follow this link Useful Threads.

 

If this post was useful, click the Like This button over there ======>>>>>.

Link to comment
Share on other sites

Geoffrey, some ideas

- How are the entries in the DB? Correct text or something else?

- Did you try this regarding correct DB entries in urf8?

- Do you enter everything in admin using english? How if you do it in Danish?

Link to comment
Share on other sites

utf-8 is one of the most misleading thing in web page design. if you select (browser) Encoding -> Western, the page shows correctly, that would tell you the CHARSET should be iso-8859-1, which is what i used for (also) a denish website a couple of years ago. nothing to do with db or connection/collation etc.

Ken

commercial support - unProtected channel, not to be confused with the forum with same name - open to everyone who need some professional help: either PM/email me, or go to my website (URL can be found in my profile).

over 20 years of computer programming experience.

Link to comment
Share on other sites

I have found that UTF-8 doesn't work for many languages other than English. In most cases I change it to iso-8859-1.

 

 

 

Chris

Link to comment
Share on other sites

Geoffrey,

 

I have a 4 language site and found that the best way to display all languages correctly is this:

 

 

set language files to charset=iso-8859-1

 

MySQL connection collation utf8_general_ci

 

db collation utf8_general_ci

 

Use html codes in place of foreign characters, in all the texts and in all the language files. The only places where I use foreign characters "as is" is in the language files that send emails, otherwise people get the html instead of the proper character in the email.

 

This displays languages correctly wether you view the site with Western ISO-8859-1 or Unicode UTF-8 in the browser, the only slight problem is if you use the USU5Pro rewrite addon, what happens is the category links and category titles show a question mark inside black diamond ONLY if you force the browser to view in Unicode UTF-8, due to the character conversion files ('à' => 'a'). Normally though the browser picks up the page encoding (Western ISO-8859-1) and so displays it correctly.

 

Hope that helps.

~ Don't mistake my kindness for weakness ~

Link to comment
Share on other sites

Wow, go out shopping and get all this.

 

 

George:

 

The characters in the db show as danish so the data is there in the db.

 

 

When the product description record id viewed using phpmy admin the char set shows as utf-8

 

 

I should have said

 

When the product description record is viewed using phpmy admin the char set shows as utf-8 and the danish characters are displayed.

 

Bearing the above in mind, no didn't try those add-ons "yet".

 

Finally when you edit the description fields using categories.php, the characters show as diamonds but if you change them to the danish character and save it , hey presto they appear in the shop as Danish characters.

 

Thanks for the input.

 

Ken

 

I tried all the encodings on ie8 and FF and nothing displays the characters correctly. Or is it just my PC?

 

On this producrt

 

http://shop.fyns-dykkercenter.dk/product_info.php?cPath=25_26&products_id=111&language=da

 

the character between the chevrons

 

>>>> p� <<<<

 

should be pa with a small circle over the a.

 

Really I am looking for a solution to get what is in the db displayed rather than a workaround, but if changing the encoding in the browser doesn't display it then I am not sure what the next step could be.

 

Does this shake any more ideas out of the grey cells?

 

Thanks

 

G

Need help installing add ons/contributions, cleaning a hacked site or a bespoke development, check my profile

 

Virus Threat Scanner

My Contributions

Basic install answers.

Click here for Contributions / Add Ons.

UK your site.

Site Move.

Basic design info.

 

For links mentioned in old answers that are no longer here follow this link Useful Threads.

 

If this post was useful, click the Like This button over there ======>>>>>.

Link to comment
Share on other sites

From your page info it appears you have a mix of iso-8859-1 and UTF-8.

My browser automatically picks Unicode UTF-8 when I view your site, and I see black diamonds in the body but not in the header nav bar or footer. If I force it to Western ISO 8859-1 it's the opposite, the body has the correct characters but not the header nor the footer. Have tried on FF 9, Chrome, IE 9.

 

What if you set the DB collation to latin1-swedish-ci and the language files to charset=iso-8859-1?

 

Also, this addon may help: http://addons.oscommerce.com/info/7628/

~ Don't mistake my kindness for weakness ~

Link to comment
Share on other sites

Well spotted.

 

Now I can change the encoding to iso-8859-1 in FF and it displays the correct character. What is going on? Couldn't before but must have been operator blindness.

 

In danish.php I have this

 

// charset for web pages and emails

define('CHARSET', 'ISO-8859-1');

 

The collation on the db is latin1_swedish_ci

 

Saved both danish.php and prodinfo.php as western ISO.

 

Still get the diamond and square in ie.

 

Also changed the db and desc fields to latin1_swedish_ci, still diamonds.

 

Where is the next brick wall to bang my head on?

 

Cheers

 

G

Need help installing add ons/contributions, cleaning a hacked site or a bespoke development, check my profile

 

Virus Threat Scanner

My Contributions

Basic install answers.

Click here for Contributions / Add Ons.

UK your site.

Site Move.

Basic design info.

 

For links mentioned in old answers that are no longer here follow this link Useful Threads.

 

If this post was useful, click the Like This button over there ======>>>>>.

Link to comment
Share on other sites

  1. error.png
    Sorry, I am unable to validate this document because on line 127 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
    The error was: utf8 "\xF8" does not map to Unicode

Link to comment
Share on other sites

 

Saved both danish.php and prodinfo.php as western ISO.

 

 

Do you mean you saved them as western ISO after editing in notepad?

 

Personally I save all files in ANSI.

 

To recap, I have:

// charset for web pages and emails
define('CHARSET', 'ISO-8859-1');

 

in language files

 

 

my DB is:

 

MySQL connection collation utf8_general_ci

db collation utf8_general_ci

(but I believe it should also work if set at latin1_swedish_ci)

 

and I save all files in ANSI.

 

 

One thing you might look at: where did the danish language pack come from? Have a look how it's encoded. I ask because when I installed my language packs I had huge problems with dispalying the foreign characters, so I had to go through all the files and change the encodings, change from UTF-8 to ISO-8859-1, then save them all in ANSI, then I also replaced all foreign characters with HTML codes.....ie:

à

instead of à.

My DB now contains HTML codes rather than foreign characters "as is".

~ Don't mistake my kindness for weakness ~

Link to comment
Share on other sites

Geoffery,

 

If you have (as I read it) upgraded a old latin_1 database to utf-8 encoding all you would have changed is the way the new inputted charactors are stored - all existing charactors/data that you have in the database remain as latin_1 encoding... this is why this happens:

 

 

 

Finally when you edit the description fields using categories.php, the characters show as diamonds but if you change them to the danish character and save it , hey presto they appear in the shop as Danish characters.

 

 

In other words you have to go through all the bad charactors and change them (don't envy anyone this job on a non-english site)

 

 

Finally your database MUST have utf8_general_ci collation NOTlatin1_swedish_ci otherwise you are just making the mess bigger

 

 

Julian

My store is currently running Phoenix 1.0.3.0

I'm currently working on 1.0.7.2 and hope to get it live before 1.0.8.0 arrives (maybe 🙄 )

I used to have a list of add-ons here but I've found that with the ones that supporters of Phoenix get any other add-ons are not really neccessary

Link to comment
Share on other sites

what kind of characters do you have in the DB? diamonds or the proper danish character?

 

I too upgraded my old shop to 2.3.1, the old DB was latin1_swedish_ci.

 

I'm nowhere near an expert on this, but speaking from personal experience, if you have black diamonds in the DB you have to manually change them all to enter the correct character, which is a big pain but unavoidable.

 

If the correct character is already there it's a matter of the danish language pack not being correctly encoded. I had the exact same problem with my italian pack, it had charset UTF-8 and was saved in UTF-8 without BOM which was causing an incredible mess. I had to change all the files and what I posted above is what worked for me, but perhaps there are other/better ways of achieving it.

~ Don't mistake my kindness for weakness ~

Link to comment
Share on other sites

  • 2 months later...

Hi Geoffery:

 

I’m running RC2A but was having the exact same problem with utf-8 showing special characters (Spanish) correctly on one part of the page and but not the other and then when I switched to iso-8859-1 and the part that was showing correctly would now not show correctly and vice versa.

 

The problem is just like Isabella said. Some of my Spanish pages are straight text with characters written as is and some of my pages correctly have the special characters encoded correctly. It’s probably best to just go through your site and correct pages not coded correctly. This site (which I am in no way affiliated with) or a site like it can help with the task, “http://www.charset.org/html-special-characters.php”.

 

My fix was to first fix/patch the problem immediately, because I have a live site, by editing “.htaccess”.

 

1) I opened up .htaccess as the base of the website and added “AddDefaultCharset charset=iso-8859-1” and then saved it.

 

2) Through PhpMyAdmin, I updated my tables to “utf8_unicode_ci”.

 

3) I made sure catalog/includes/languages/”language.php” had both “@setlocale(LC_TIME, 'en_US.ISO_8859-1');” and a little further down the page “define('CHARSET', 'ISO_8859-1');” both set to “iso_8859-1”. Verify the same for your “Admin” folder.

 

4) Go through the process of correcting you language files with the correct encoding, which is what I’m doing now. Because I set my .htaccess file I don’t have to feel hurried about it.

 

Good luck!

 

Hi

 

I have converted a db that has 2 languages, Danish and English, from rc2a to 2.3.1 by

 

1. Exporting the data from an RC2a site

 

2. Creating a db with

 

MySQL charset: UTF-8 Unicode (utf8)

MySQL connection collation utf8_general_ci

 

db collation utf8_general_ci

 

3. Importing the source file to this db specifying the input file is utf-8

 

4. Importing a script to alter fields/indexes etc as required.

 

5. Running a script to correct the data, populating indexes and null fields so it does not break a 2.3.1 site.

 

6. Exporting the file.

 

7. Importing the file to a 2.3.1 db.

 

The site works fine but the Danish description fields display a black diamond with a question mark in it when viewed in FF or a square on ie browsers.

 

I have tried so many combinations of file formats during the import, export, changed the db charsets/settings but can not get the Danish characters to display.

 

When the product description record id viewed using phpmy admin the char set shows as utf-8

 

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en-gb"
lang="en-gb"
dir="ltr">
<head>
<link rel="icon" href="./favicon.ico" type="image/x-icon" />
<link rel="shortcut icon" href="./favicon.ico" type="image/x-icon" />
<title>phpMyAdmin 3.4.3.1 -
phpmyadmin.xxxxhost.dk</title>
<meta http-equiv="Content-Type"
content="text/html; charset=utf-8" />

 

I have set the charset in the languages file to utf-8 UTF-8 da_DK.utf8

 

<!DOCTYPE html>
<html dir="ltr" lang="da"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

 

This is when I was trying upper-case.

 

The site is

 

http://shop.fyns-dykkercenter.dk

 

there is htaccess on the site

 

User : osc

pw: pepsi

 

You can see it on this product

 

http://shop.fyns-dykkercenter.dk/product_info.php?products_id=111&language=da

 

Can anyone suggest a solution?

 

TIA

 

G

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...