|
I have spent the past year or so developing a complex web application using PHP, AJAX and a SQL Server database. My client now wants to have a French version of the application with the possibility of supporting other languages in the future. Although it would have been easier to design in this requirement at the start of the project, retroactively implementing it should not cause too many problems. The obvious (and traditional) way of implementing multi language support is via a database table containing a key field and the phrase in each required language in subsequent fields. Something like:
Key |
English |
French |
| hello |
Hello |
Bonjour |
| howru |
How are you? |
Comment ca va? |
| fine |
Fine thank you |
Ca va bien merci |
And this is just fine for simple phrases. However, suppose we need a phrase such as "Query returned 12 results". We could add a database record to the table containing:
Key |
English |
French |
| querynum |
Query returned 12 results |
Requête retourné 12 résultats |
Except of course, what happens if the query returns 9 results or 1 or none? It doesn't make sense to have a separate phrase in the database for each possible number of results so we might implement something like:
Key |
English |
French |
| querynum |
Query returned @1 results |
Requête retourné @1 résultats |
Where the @1 is replaced with the actual number of results in an operation after the phrase has been returned from the database. If there is more than one variable to be inserted into the phrase, this can be done using @2, @3, etc. If it's necessary to incorporate an @ into the phrase, this can be done using @@. So far, so good; except when only one result is returned. In this case, the required message in English needs to be "Query returned 1 result"; where result is now singular not plural.
This can be handled in a similar manner to the insertion of variables. In the phrase for English, we could include a construct such as "|@1|result|results|" so that if variable @1 is 1, it returns the first word, otherwise the second. In both English and French, 0 is treated as plural, and so is -1. Don't ask me why, I don't pretend to be a philologist. This may not be true for all languages so for completeness, the substitution needs to be able to cope with all the cases: <-1, -1, 0, 1, >1. This then suggests that the database entry for this phrase should contain:
Key |
English |
French |
| querynum |
Query returned @1 |@1|results|results|results|result|results| |
Requête retourné @1 |@1|résultats|résultats|résultats|résultat|résultats| |
and the substitution will then return the expected phrase for any value of @1 in any language.
If the phrase to be substituted is a name rather than a value and if the text is a noun, then problems arise in translation to French depending on whether the substituted text is a masculine or feminine noun. For example, the basic phrase "The @1 is green " can in English be used for "The tree is green", "The car is green" or "The dog is green". However, the translations into French are respectively: "L'arbre est vert", "La voiture est verte" and "Le chien est vert". This is exceptionally tricky and just as complex in languages other than French! My application needs to be able to do exactly this, although not for the sort of trite phrases I use as examples.
So, this means an entirely different approach is called for to solve this type of problem. The simplest way of solving it is to generate the required phrase in English and then use an online translation tool to return the French version. So in our database, instead of using a direct translation inside the table, we need a flag to indicate that the English phrase needs to be generated first and then translated. Something like:
Key |
English |
French |
| phrase |
The @1 is green |
@English|Google |
Where the French entry means generate the English phrase and then use Google Translate to prepare the French version. Both Google and Microsoft (the two best online translators) have AJAX implementations of their translation tool available. Better translation engines may come along in the future so adding the translater after the originating language allows future proofing and allows us to select the better translation for each individual phrase. The only significant drawback of this approach is one of performance. Making 50 or so AJAX calls to translate various phrases one at a time cannot be done instantaneously. Performance issues will need to be addressed once a prototype has been constructed.
Using automatic translation should be the default action if a phrase is not in the database so most of the simple examples given above can be handled using this method. It's only when auto translation fails to provide a good result that we need to resort to specific translations and include them in the database table. One example where auto translation is likely to come unstuck is for menu items where the phrase for each action is short and may contain subtleties. Also, of course, translation failures may be more prevalent in languages other than French. For these, auto translation would be the initial default with specific translations added as and when translation failures are discovered.
For phrases and words where automatic translation is less than perfect, we need a further refinement. My application is based in the oil industry and works with the physical properties of oil such as "Cloud point", "Conradson carbon content", "Kinematic Viscosity at 50C", etc. Translating these phrases automatically often results in errors. So, when I need to output a message along the lines of "The value provided for @1 is out of the valid range", where @1 is the physical property name then I want to be able to recursively search the database to find the property name in the right language and then substitute that part of the automatically translated phrase with the better translation. The algorithm to do this might consist of:
- Use the automatic translation to translate the entire phrase including the property name.
- Use the automatic translation to translate the property name on its own.
- Look up the better translation of the property name in the database table.
- Do a text search and replace to change the property name generated in step 1 with the better translation from step 3.
This is not be a foolproof algorithm. It assumes that the translation of the property name on its own will be the same as when it is incorporated within a sentence.
Within the database, the entries required to initiate this process of partial automatic translation may consist of:
Key |
English |
French |
| phrase |
The value provided for @1@2 is out of the valid range |
@English|Google|@1 |
| propkv |
Kinematic Viscosity |
viscosité kinématique |
The @1 in the English version contains the property name (in this case "Kinematic Viscosity") and @2 contains the qualifier (in this case " at 50C"). If there is no qualifier value, then @2 will be empty. The @1 in the French column indicates that there may be a fixed translation for this part of the phrase within the database and that this should be handled using the above algorithm.
As with the automatic translation programs, a feature can be added into my application so that users can suggest better translations than the automatic (or defined) version. These would be put in the database and flagged as interim translations. An administrator can then authorize (or discard) the user suggestions.
And I think that should do it. Some minor refinements may be required along the way, if so, I'll report back with them. Do you think it will work?
Please feel free to comment on what you've just read by adding a note in the box below. Your name will be posted alongside your comment but your e-mail address is only for my records and will not be made public or sold or given to any third party. If you choose not to give an e-mail address, that's fine but your credibility is increased in my eyes if you are prepared to stand by your comments. Please do not be abusive, use strong language or post spam or other junk. Due to persistent abuse by spammers, all comments will be moderated before they are published. Therefore, your comment will not appear immediately. By commenting on this form, you agree to permit Activeminds Software Ltd. to publish your comments on this website. Activeminds Software Ltd. accepts no responsibility for any comments posted on this site. They are solely the view of the commentator.
|