Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

PHP 5.4 upgrade: now problem with special characters not displaying


vjo03

Recommended Posts

Hi!

Have you got a fix for this? Or tips on things to read?

 

I think I've got the same problem. I'm from Norway, and after PHP got upgraded to 5.4, certain fields that include norwegian special letters (æøå), are blank. E.g. users name during a new order placement, users name on the admin/index.php . Strage thing is that if I go to admin/customers.php - all names show up correct. The database too looks all fine.

 

I would appreciate all help.

 

Kind regards

Link to comment
Share on other sites

Most likely this text is Latin-1 (or some other single byte encoding), and your pages (and possibly the database itself) are UTF-8. Thus, those letters are invalid and can't be displayed (which may break the entire thing). Have you looked at the text using phpMyAdmin to browse the database?

Link to comment
Share on other sites

Thank you, @@MrPhil.

 

Everything should be running in latin 1, iso 8859-1, as far as I can tell. It worked perfect before the PHP upgrade. OsCommerce is version 2.3.3, and I'm running at a shared host.

 

I have looked at the text using phpMyAdmin, and it is perfectly fine.

 

Does it make any sense to you that not all the pages are affected? E.g. a customers name that contains æøå does not show up on admin/index.php - the name field is just blank. However, if I open admin/orders.php the same names show up just fine! According to the browser, both pages has charset = iso-8859-1.

 

Customer names that contain æøå are also missing/blank in e-mails. Customer names in reviews, the same. The review's text are also gone/blank, because they all contain æøå.

But for instance, product names/descriptions with æøå shows up perfectly fine!

 

Have you got any idea why only certain pages/fields are affected?

 

Kind regards

Link to comment
Share on other sites

osC 2.3.3 is normally UTF-8, isn't it? Anyway, have you checked your entire database to make sure that all tables and fields are Latin-1 instead of UTF-8, if that's what you want? Is the data itself correct (single byte codes for accented characters) and not the wrong encoding or a mixture of encodings?

 

If it was working properly before the PHP upgrade, ask if MySQL was also upgraded at the same time. Look around for any PHP error messages (error_log files, etc.) that might be reporting an error. During the upgrade, could any settings such as the PHP include path gotten messed up? If MySQL was touched, maybe they changed settings there to UTF-8 or something.

 

I would be very concerned about the competence of a host who bumps you up to PHP 5.6, which is really bleeding edge for a production server.

Link to comment
Share on other sites

My host did not upgrade to PHP 5.6, but to 5.4. (Perhaps I should have started my own thread due to this fact...)

 

I think it is correct that osC is normally is UTF-8. I had a 2.2rc2a installation that was using 8859-1. When I migrated to osC 2.3.3 I had the db in latin1, and after a lot of testing the easiest was to configure osC 2.3.3 to use 8859-1, since I couldn't choose anything on the server side.

 

I have looked through the db (spesifically the problematic fields) and it is latin1. all over

 

I asked the hostmaster whether anything was done to character sets. They told me that no such changes were made, but that PHP 5.4 itself had some changes that would default UTF-8 over 8859-1 some places (?).

 

I'll have to dig deeper into logs and see what I can find. I know way too little about the nature of charset configurations, but I find it very strange that admin/index.php cannot read the customers_firstname and lastname, while admin/orders.php displays it fine.

 

( Should I look into migrating it all over to UTF-8 and osC 2.3.4? )

Link to comment
Share on other sites

Your host is correct: starting with PHP 5.4, all of the character-based functions default to UTF-8, where previous versions defaulted to 8859-1. You can change the output back to 8859-1, but this would mean a fair amount of custom code. It's easier to just switch over to UTF-8. You can convert your database using the admin tool (Admin >> Tools >> Database Tables) in osCommerce 2.3.4.

 

Regards

Jim

See my profile for a list of my addons and ways to get support.

Link to comment
Share on other sites

Did you remember to switch the encoding back to UTF-8 in your includes/languages/english.php (and other languages if you have them)?

setlocale(LC_ALL, array('en_US.UTF-8', 'en_US.UTF8', 'enu_usa'));

Regards

Jim

See my profile for a list of my addons and ways to get support.

Link to comment
Share on other sites

(for those just tuning in, and are confused, this thread was split off from another earlier thread that it had hijacked)

 

If the entire database (tables and fields) was successfully converted in phpMyAdmin from Latin-1 to UTF-8, and all configuration PHP code was converted from ISO-8859-1 or Latin-1 to UTF-8, and you got rid of any UTF-8 Byte Order Marks that might have worked their way into your code due to using a bozo editor; it ought to be working. If it isn't, you might have missed something. Be sure to check for any originally UTF-8 multibyte characters accidentally converted to multiple Latin-1 characters, and then back to multiple UTF-8 characters. This can happen when using phpMyAdmin to convert from Latin-1 to UTF-8 (when you have characters that were already UTF-8 in a Latin-1 database) or when importing a UTF-8-containing .sql backup file and neglecting to tell phpMyAdmin or MySQL that the file wasn't the default Latin-1.

 

All-in-all, it sounds like you still have Latin-1 text (accented characters) in database fields that are declared UTF-8. This can cause all sorts of problems in database operations, PHP function calls, and browser display of these invalid characters. You can try browsing the fields in phpMyAdmin, making sure that phpMyAdmin is itself displaying in UTF-8 mode. Copy-and-paste from a Windows word processor or editor (usually Windows-1252 encoding) into a UTF-8 browser window can bring in invalid Latin-1 codes if the editor doesn't provide a UTF-8 version to the clipboard, or your browser doesn't make use of it. I've seen that happen, too.

 

You definitely want to get your site properly working with the current 2.3.3, before thinking about upgrading to 2.3.4. Even then, you might want to hold off until after the Christmas rush (if you have time, you might play with your test installation).

Link to comment
Share on other sites

@@kymation: I indeed remembered to change the encoding in the language file.

 

Also, I tried to register a new customer account (with a name including æøå) and run through a new order after I converted the db - but the problem was still the same. The name does not show in the order process if it contains æøå.

Thanks for cleaning up my mess and split the threads, @@MrPhil. I agree that it should work, and I'm sure I have done something wrong.

So what I did: I dumped the db to .sql a file via phpMyAdmin. I chose UTF-8. I opened the file in Notepad++ and verified that the encoding was "UTF-8 without BOM". In addition I replaced all occurrences of latin1 with utf8 in the file, saved the file, and used that file to create my new test database. In phpMyAdmin, everything looked fine.

 

I duplicated my osCommerce installation (and changed the configuration files to match the test installation dir). But it came out the exact same way as earlier.

 

Then, I decided to download a new fresh osCommerce 2.3.3 and replace all the files in my test dir with those. osC 2.3.3 is native UTF-8 right? Still, it did not work out...

 

Strange thing though, when I open a .php-file from the newly downloaded osC 2.3.3 in Notepad++, it shows me that it is encoded in "UTF-8 without BOM". However, after upload, when I load the page, my browser tells me that it is "Western European (ISO)" encoding (?).

 

I'm sure I do something wrong somewhere. Or is it anything that fools me (i use WinSCP to upload the files via FTP - could that be a culprit?) ?

 

I also noticed that in Norwegian, the reviews are there with it's picture and title. (Customers name may also show if it dosen't contain æøå, and probably the text too). However, when I switch to english display language, it just says that "There are currently no product reviews"...

 

You are talking about accented characters and single byte codes. I have no clue about that. How do I know what's in use?

 

Thank you so much for all help and support! Kind regards.

Link to comment
Share on other sites

Strange thing though, when I open a .php-file from the newly downloaded osC 2.3.3 in Notepad++, it shows me that it is encoded in "UTF-8 without BOM". However, after upload, when I load the page, my browser tells me that it is "Western European (ISO)" encoding (?).

 

I'm sure I do something wrong somewhere. Or is it anything that fools me (i use WinSCP to upload the files via FTP - could that be a culprit?) ?

Note that PHP files should be pure ASCII (Latin-1/Windows-1252/UTF-8 is irrelevant, as there should be no non-ASCII characters). The exception is for osC's "language" files, containing accented and other non-ASCII text strings. They would be a specific encoding such as Latin-1 or UTF-8. .sql (database backup) files would also contain non-ASCII text.

 

If you are sure it's a UTF-8 file on your PC and it contains accented (non-ASCII) characters, but shows up as Latin-1 on the server, it's probably not the FTP client. I've never seen one that made it easy to accidentally change encoding on the fly, but I suppose it's possible. First, browse the uploaded file from your hosting control panel editor, to see if it appears to have single byte or multibyte accented characters. Most editors will tell you what they think the file encoding is. Second, check that the resulting HTML (browser: view source) actually gives UTF-8 as the encoding (i.e., you didn't upload an old version of PHP files). Also confirm that the browser itself thinks that the HTML is UTF-8 and not something else (view character set or view character encoding). Third, check your .htaccess to make sure you don't have some command to override encoding and force Latin-1 or Windows-1252. Fourth, ask your host to check that they haven't misconfigured something -- I have seen misconfigured hosts that override encoding requests and force a specific encoding such as Latin-1.

 

If your problem is that accented characters are disappearing (possibly taking the rest of the text with them), but the rest of the page comes up OK, I suspect that the problem is back at the database end. Specifically, you've got Latin-1 encoded accented characters in the database, in UTF-8 declared fields, and they are causing either MySQL or some PHP character and string functions to blow up. In the latter case (at least), there should be some error messages logged.

 

If everything is supposed to be UTF-8 encoded, but you find some Latin-1/Western/Windows-1252 text (single byte accented characters) in the database, you'll have to manually edit those fields in phpMyAdmin to fix the text (overwrite those characters with UTF-8-encoded text).

Link to comment
Share on other sites

Thanks for clearifying things. Still no progress here...

Let's for now, just forget about the Norwegian language pack, that uses Latin-1.

I have uploaded, as said, a fresh osC 2.3.3 and use it with the default english language in UTF-8 (it is configured so in the language file) and the output HTML code says it is so.

Then - only the db could be the problem, right?

The products_name and products_description are fine all over. The customers_firstname and customers_lastname is two of the fields that causes problems. How could I determine how these fields are declared in phpMyAdmin? Could I see anywhere (1) which charset the fields is declared as, (2) which charset the contents are in and - (3) does the "collation" affect the values upon create and update, or is it there for sorting purposes only?

If i submit the creation of a new users account, with the site running in UTF-8 with a UTF-8 language pack into my UTF-8 converted db, I see the exact same problems. So it won't be the insertion that is failing, but rather the reading?

I have nothing declared in .htaccess besides access control.

The host haven't provided me with any control panel or editor. No log files either, as far as I know. Let me ask them about this.. You suggest I should ask if they haven't misconfigured something. I will do, but what specifically should I ask about? I have already asked them whether they changed anything during the upgrade - they say they did not.

 

By the way, I see that MySQL character set on the server is UTF-8 Unicode (utf8).

Through server info in osC admin, I see that PHP "default_charset" has "no value", neither local or master. (Unfortunately, I don't know the output with the old PHP version). Could this be a problem?

 

I have a feeling that I appear as a person who have no idea what I'm doing... Truth is that character encoding always has been my worst nightmare... :P

 

You have no idea how thankful I am for any suggestions!

Link to comment
Share on other sites

Hm.. I got something.

 

I found this thread. Look at the tip in post #4. It is about adding "tep_db_query("SET character set utf8");" after "tep_db_connect..." in the ..._top-files.

I tested this, and got a result: The missing text came back, but now all æøå's were replaced by unknown characters. (I really mean all øæå's - every æøå in reviews, product descriptions, categories.. and so on).

 

So what I think, is that I now forced osC to talk to the db using utf8. As it didn't get the correct respnse back, my conclusion was that my so-called "converted" db was not properly converted after all.

 

I took a fresh export of the original db and converted it to "UTF-8 without BOM" using Notepad++. Afterwards, I used both Notepad, Notepad++ and "file -i" on Linux to confirm that the .sql-file was really UTF-8. I started importing this file back into my test db. Pretty soon, I ran into syntax errors everywhere in my products descriptions where apostrophes were used.

 

Last time I "converted", I had no such issues so I took this as a sign that my last convert was wrong. After fixing a whole lot of those syntax errors, I was ready to go. Unfortunately, the results just confuses me: The results are exactly the same...

 

Prior to the first test, I had restored the original .._top-files.  Without the "fix" to the _top-files, the usual text is gone - and with the "fix" in place, all æøå are unknown characters.

I have tried to play with the charset in the language files - but I don't get any new results.

 

@@MrPhil, what would you say this indicates?

 

How can I be sure which charset my db really is and which charset the contents of my db really is?

 

May it be so (or is it possible) that my db has been in UTF-8 all the time but what I have really done is fill it with latin1-encoded stuff? (Latin1 is a subset of unicode? And could that be the reason why it always has worked?) Perhaps it is impossible to be unaligned like that...

Link to comment
Share on other sites

Well, you not only look at the encoding/collation specified for the whole database, but also at each table, and at each text field within a table. Make sure that you don't have one encoding overriding another at the table or field level.

 

Once you are sure that everything is the same encoding and collation, you can browse text fields (especially the problem ones) in phpMyAdmin and check that the accented characters show up correctly. Check that phpMyAdmin itself is displaying with UTF-8 encoding (its web page) by looking at its web page source in your browser. Also ask your browser what encoding or character set it thinks it's using for a given page (View > Character Set or something similar).

 

If it's just one page showing up with odd encoding, take a careful look for UTF-8 BOMs (badly edited PHP files) and any sign of hard coded HTML specifying the wrong encoding. You would have to look at the browser Page Source and work backwards through all the PHP files to see where any wrong encoding came from.

Link to comment
Share on other sites

Here's what i have.

 

Database:

 

(1) COLLATION
SHOW VARIABLES LIKE "collation_database"
collation_database     utf8_general_ci

(2) CHARSET
SHOW VARIABLES LIKE "character_set_database"
character_set_database     utf8

 

--

 

Table:

 

(1) COLLATION
SHOW TABLE STATUS
All collations are utf8_general_ci

(2) CHARSET
SHOW CREATE TABLE mytable;
DEFAULT CHARSET=utf8

 

--

 

Column:

 

SELECT
  COLUMN_NAME,
  TABLE_NAME,
  CHARACTER_SET_NAME,
  COLUMN_TYPE
  COLLATION_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'mydbname'

gives me CHARACTER_SET_NAME utf8 (for all text fields)

 

--

 

What more is there to verify? Could I check the values of fields?

 

The "fix" mentioned in my last post, that forces the db connection to use utf8 is not good either. (I really did not consider it a "fix" any time, rather a debug setting...). If this setting is activated and a user registers a name with æøå, the name isn't inserted to the db.

 

The characters shows up fine in phpMyAdmin, and the page itself is utf-8, according to my browser and the page source.

 

I'm convinced there is a mismatch somewhere in my setup, but I can't find it..

Link to comment
Share on other sites

I ran into syntax errors everywhere in my products descriptions where apostrophes were used.

Ah, your apostrophes are almost surely MS Smart Quotes and not either proper ASCII quotes nor proper Unicode quotes. If you copy and pasted from a Word etc. document (Windows-1252, or perhaps in your case, a Nordic or Baltic Windows-125x variant), especially into a Latin-1 browser page, that's very easy to do. Into a UTF-8 browser page, it should pick the UTF-8 encoded version from the clipboard, if it's offered, but there's no guarantee.

 

Note that MS Smart Quotes may not be converted (in-place) to their appropriate multibyte UTF-8 equivalents if phpMyAdmin (MySQL) has been told that the text to be converted is Latin-1 rather than Windows-1252.  It may think that they are control codes and are to be left alone.

 

How can I be sure which charset my db really is and which charset the contents of my db really is?

phpMyAdmin will tell you the schema ("structure") for the database, each table, and each field (column), including what the encoding is and what the collation (sort order) is. You have to browse your data and see if anything odd shows up (invalid character ?-in-black-diamond for UTF-8 fields, multiple funny accented characters for Latin-1 fields) to indicate the wrong data is in there. If you find the wrong data, you will have to figure out what the right character is, and edit it in. If it's not obvious, you'll have to look at the data (text) in an editor with the "wrong" encoding to see what the correct text is. If it's 100% wrong, you might be able to re-import a backup (.sql file) for those tables or fields, declaring the correct encoding for the .sql file, but otherwise you'll have to hand-edit your data.

 

May it be so (or is it possible) that my db has been in UTF-8 all the time but what I have really done is fill it with latin1-encoded stuff?

Possible. If you put Latin-1 text (accented characters) into a UTF-8 field, and they weren't rejected with an error at that time, all sorts of bad things can happen down the line (see below).

 

(Latin1 is a subset of unicode?

Yes, but mostly No. The hierarchy is...

  7-bit ASCII (x00-x7F, including A-Z, a-z, but no accented characters), a subset of

  Latin-1 (ISO-8859-1), adding new control codes in x80-x9F points, and many

    Latin accented characters in xA0-xFF. It is more or less a subset of

  Windows-1252/CP-1252/Western where the 32 control codes at x80-x9F have

    been replaced by "Smart Quotes", mostly typographically desirable

    punctuation marks (such as quotes).

 

  Unicode itself is a superset of Latin-1. That is, the first 256 code

    points U+0000 through U+00FF are identical to x00 through xFF. Note that this

    does not include the Smart Quotes of Windows-1252, but rather, the 32 control

    codes of Latin-1. UTF-8 is a space-saving (compressed) coding of Unicode,

    where everything above the ASCII set (U+0080 and up) is encoded as 2, 3, or

    4 bytes, which are considerably different from the single byte Latin-1 characters

    x80-xFF! This normally results in a savings of space compared to 16-bit Unicode,

    as most of the text is usually plain ASCII (and can be stored in single bytes).

    Unicode != UTF-8.

 

To add to the fun, many browsers conflate Latin-1 and Windows-1252, treating a request for Latin-1 as a request for Windows-1252. This is usually safe to do, as the Latin-1/Unicode control codes x80-x9F have very rarely been seen in the wild. It's much more likely that a "Latin-1" page will include MS Smart Quotes than those rare control codes.

 

If your text data is actually not quite Windows-1252 or Latin-1 (perhaps the Nordic or Baltic variant of Latin-x/ISO-8859-x, which is close to but not exactly the same for a few characters), be careful to check that any translations in-place or during .sql file import are declared correctly, and that phpMyAdmin/MySQL does the translation correctly. I would not simply assume it's Latin-1, or even Windows-1252, just because it's a single byte encoding.

 

If the page is described as UTF-8, it will barf on any Latin-1/Windows-1252 characters in the x80-xFF range, usually giving you the invalid character ?-in-black-diamond glyph instead. If a page is described as Latin-1 or Windows-1252 encoding, a UTF-8 character in the range U+0080-U+00FF will show up as 2 or 3 odd accented characters, rather than the desired single accented character.

 

Further up the food chain, Latin-1/Windows-1252 accented characters in the x80-xFF range can cause the (UTF-8) database to become ill when it tries to handle them (such as when doing a sort). This can result in the text being cut off at the offending character, or even the whole operation returning no text at all. You may or may not see an error message on the screen or in a log. The same thing can happen with PHP string-handling routines (sorts, comparisons, substrings, etc.).

Link to comment
Share on other sites

Thanks, @@MrPhil, for sharing your knowledge!
 

Ah, your apostrophes are almost surely MS Smart Quotes and not either proper ASCII quotes nor proper Unicode quotes. If you copy and pasted from a Word etc. document (Windows-1252, or perhaps in your case, a Nordic or Baltic Windows-125x variant), especially into a Latin-1 browser page, that's very easy to do.

 
I think you are right. Some of the products descriptions are probably copied from other perpared material.

However, replacing them meant I could import the db without errors, but the results were still the same.

 

When I exported the db using the backup function in osC, and used that sql-file to populate my test db - it all went smooth without errors. The result, though, were the same as always. When looking at the content in phpMyAdmin, it all looks fine always, and no strange characters/placeholders are in sight.

 

(And remember; only some pages refuses to view the content. E.g. the order list at ../admin/index.php can't show customer names that includes æøå, but the order list ../admin/orders.php does it fine.)

 

Could you take a look at my previous post, where I walked through the db and examined the charset used. Did I forget something there?

Link to comment
Share on other sites

Latest observations: When I export a db backup from osC, the file is in iso-8859-1 encoding.
 
I converted the file over to utf-8 using iconv in Linux and created my test db from this file. Afterwards, when I used my test osC to download a backup of the db, the file is in iso-8859-1 encoding!
However, if I add the "fix" mentioned above...
 

"tep_db_query("SET character set utf8");" after "tep_db_connect..." in the ..._top-files.


... the downloaded backup file is utf-8.

 

Wouldn't this fact indicate that iso-8859-1 is preferred somewhere in the setup, even if I can't figure out where it is defined?

Link to comment
Share on other sites

  • 1 year later...

I had a similar problem with Joomla and upgrading from php5.3 to php5.5. It seems that in php5.3 the php.ini setting for mbstring.http_input = pass, but in php5.4 it was changed to auto. In WHM goto PHP Configuration Editor > Choose Advanced Mode > Find mbstring.http_input and change it to pass. Then Save your configuration.

 

If for some reason you do not see any mbstring settings at all, you may have to have your server administrator turn it on or load it for you. If you see mbstring crossed out with Default next to it, don't assume the default setting is turned on. Manually edit the setting. I had to turn it off and then turn it back on for it to work. If you don't have access to your server, contact your host admin and ask them to look in to it. Hope this helps. (Also, it's a good idea to take notes when changing you php.ini file on your server. Just in case you turn something off that needs to be on you can go back without getting lost.)

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...