Message Boards Message Boards

GROUPS:

Using TextTranslation to make WolframAlpha work with ANY language

Posted 5 months ago
999 Views
|
3 Replies
|
17 Total Likes
|

One of my pet peeves about Wolfram|Alpha has been that it only works well with English queries, and not very well (if at all) in other languages.

So when we included a new translation function in version 11.1, TextTranslation, I was thinking about using it with Wolfram|Alpha.

I am not sure how I got sidetracked, but I finally got around to actually looking into this and the results are actually quite amazing, so I want to share a few results here.

If you want to try to out for yourself today, there is a very easy way to do so by installing the following paclet from my GitHub page:

PacletInstall["https://github.com/arnoudbuzing/prototypes/releases/download/v0.2.3/Prototypes-0.2.3.paclet"]

Or if you don't want to install the full paclet (because it includes dozens of other functions as well), you can simple use the code defnitions from here:

https://github.com/arnoudbuzing/prototypes/blob/master/Prototypes/WolframAlpha.wl

So let's take a look at some examples and how they compare to the untranslated WolframAlpha function. Let's say we want to ask Wolfram|Alpha what the capital of Japan is (so: Tokyo). Clearly the English query works:

WolframAlpha["what is the capital of Japan", "Result"]

(gives Tokyo as the answer). But now let's ask the question in Dutch:

WolframAlpha["wat is de hoofdstad van Japan", "Result"]

Now you get a very very strange answer: 18.4 million vehicles (2004 estimate). This is clearly very wrong and makes Wolfram|Alpha completely useless for Dutch queries.

So now let's think about what is required to improve this situation: We first need to see if we're dealing with a non-English query and, if so, we need to translate it to an English query and run that through the WolframAlpha function. Here is the Wolfram Language code that does exactly that. I named it WolframBeta to distinguish it from the original function:

WolframBeta[ input_String, args___ ] := Module[{language, translation},
  language = LanguageIdentify[input];
  translation = If[language =!= Entity["Language", "English"], TextTranslation[input, language -> "English"], input];
  WolframAlpha[translation, args]
]

Now let's try this function:

WolframBeta["wat is de hoofdstad van Japan", "Result"]

It's a little slower due to the call to the translation function, but it actually gives the right result (Tokyo)!

And it immediately works for many languages:

WolframBeta["cual es la capital de japón?", "Result"]
WolframBeta["日本の首都は何ですか", "Result"]
WolframBeta["Was ist die Hauptstadt von Japan?", "Result"]
WolframBeta["什么是日本的首都", "Result"]

These all return Tokyo as well, whereas the WolframAlpha function fails in unique ways for each language! (I won't include the results here, but they are embarrassing)

(To get these languages I translated from English, so I hope these are correct ... But at least Spanish and German seem reasonable to me)

Now WolframAlpha also can return "spoken results", a plain English language string of an answer. Here is an example:

WolframAlpha["what is the capital of Japan?", "SpokenResult"]

This returns with an English language string: "The capital city of Japan is Tokyo, Japan"

But TextTranslation works both ways (actually I think it works between any two languages, but in this context only to and fro English matters here).

So here is a modification which a) translates non-English to English, b) does the query, and c) translates the spoken result back into the original language:

WolframBeta[ input_String, "SpokenResult", args___ ] := Module[{language, translation,result},
  language = LanguageIdentify[input];
  translation = If[language =!= Entity["Language", "English"], TextTranslation[input, language -> "English"], input];
  result = WolframAlpha[translation, "SpokenResult", args];
  If[language =!= Entity["Language", "English"], TextTranslation[result, "English" -> language], result]
]

Now let's take a look at a few more Dutch example, my mother (and father) tongue (distance from Amsterdam to Rotterdam, two Dutch cities):

WolframBeta["hoe ver is het van Amsterdam naar Rotterdam in kilometers?", "SpokenResult"]

This gives (correctly): "Het antwoord is zo'n 56.4 kilometer"

And, how much vitamin C in a glass of orange juice:

WolframBeta["hoeveel vitamine C in een glass jus d'orange?", "SpokenResult"]

With the following result: "Het antwoord is ongeveer 93 milligram"

And, something (GDP of the Netherlands) with a slightly more complex answer, at least grammatically (this one really amazed me):

WolframBeta["wat is het bruto binnenlands product van Nederland?", "SpokenResult"]

Answer: "Het bruto binnenlands product van Nederland is ongeveer 777 miljard dollar per jaar"

Sometimes, for example when asking for the weather, you need to provide the country in addition to the city:

WolframBeta["hoe warm is het in Alkmaar, Nederland?", "SpokenResult"]

Answer (in Fahrenheit..) : "De temperatuur in Alkmaar, Noord-Holland, Nederland is 59 graden Fahrenheit"

And mathematical queries can also be answered:

WolframBeta["wat is de afgeleide van sin x", "SpokenResult"]

Answer: "Het antwoord is de cosinus van x"

And questions about famous people and their relation to other people (I hope the answer is right, I am not a big royalist):

WolframBeta["wie zijn de kinderen van prins William?", "SpokenResult"]

Answer: "De kinderen van Prince William zijn prins George van Cambridge; Prinses Charlotte van Cambridge; en Lodewijk Alexander van Cambridge"

I am hoping this or a version of this idea can be added to the official Wolfram|Alpha at some point, I definitely think it will be useful to get Wolfram|Alpha used more all over the world to help people with their computational curiousity!

Let me know what you think, comments, suggestions, and pull requests are always welcome!

3 Replies

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile!

Great idea. The code is only as good as LanguageIdentify[], though. Given enough input, it works well enough, but it is easily fooled by typos and ambiguities.

Still, way better than the current functionality.

Yes, it is far from perfect, but I was also pleasantly surprised by how much improvement you do get. For very short queries and math related queries, the language identification can be problematic. And of course things may get 'lost in translation', but this is a general problem with machine generated translations (which we can expect to improve over time).

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract