Group Abstract Group Abstract

Message Boards Message Boards

0
|
1.5K Views
|
1 Reply
|
0 Total Likes
View groups...
Share
Share this post:

CloudDeploy function returning error

Posted 2 years ago

Hi,

I've created a very simple API that imports the "Plaintext" of a URL or a PDF. Here is my code:

urlScraperTest = APIFunction[
   {"url" -> "String"},
   Import[#url, "Plaintext"] &];

And deployed it to the Wolfram Cloud:

CloudDeploy[urlScraperTest, "urlScraper", Permissions -> "Public", CloudObjectNameFormat -> "CloudUserUUID"]

CloudObject["https://www.wolframcloud.com/obj/user-7d8eeec4-69a3-4149-bc75-10de306a3cbd/urlScraper"]

I want to be able to call this API from programs outside of MMA. In testing the code, things seem to work as expected for some URLs. For instance, here is a link to a recipe that I am able to call directly from my browser

This code works fine if I pass it some URLs, e.g. here is a recipe that I am able to scrape from my browser:

https://www.wolframcloud.com/obj/user-7d8eeec4-69a3-4149-bc75-10de306a3cbd/urlScraper?url=https://www.foodandwine.com/recipes/beef-wellington

I'm also testing this by passing the URL of a PDF that I've stored in a cloud location: https://bbf184a8c110ea5f6bb4192bc1d23ad5.cdn.bubble.io/f1703793538605x556426251682794500/Perfect%20Chocolate%20Chip%20Cookies%20Recipe%20-%20NYT%20Cooking.pdf

If I enter this location in my browser search bar directly, the PDF renders correctly. But if I try to pass this URL to my Wolfram Cloud api function

https://www.wolframcloud.com/obj/user-7d8eeec4-69a3-4149-bc75-10de306a3cbd/urlScraper?url=https://bbf184a8c110ea5f6bb4192bc1d23ad5.cdn.bubble.io/f1703793538605x556426251682794500/Perfect%20Chocolate%20Chip%20Cookies%20Recipe%20-%20NYT%20Cooking.pdf

If produces a $Failed message. Note: If I call the URLScraperTest[] function directly from a local MMA notebook using the above URL, it does return the results I expect with no error. But I need to be able to call this function from outside of MMA.

I believe the problem is that the URL above contains "illegal" characters for URL encoding, but I've been unable to find a solution. Can anyone suggest a solution?

Thanks!
Mark

POSTED BY: Mark Coleman
Posted 9 days ago

Mark,

Your diagnosis is exactly right: the problem is related to URL encoding.

When you pass a URL (which itself may contain encoded characters like %20 for a space) as the value of a query parameter (?url=...), the external client (like a browser) often fails to correctly encode the entire URL string, or the Wolfram Cloud sometimes double-decodes the parameter.

The key difference is that your local Wolfram Language session is often more forgiving with Import, but the automated Cloud API environment requires the string to be a perfectly structured URL.

Solution

The most reliable way to fix this issue is to explicitly use the URLDecode function inside your APIFunction. This guarantees that whatever string the client sends as the value of the url parameter is decoded back into the original intended URL string before Import tries to fetch it.

urlScraperFixed = APIFunction[
   {"url" -> "String"},
   Import[URLDecode[#url], "Plaintext"] &
];

Explanation

When a client calls your endpoint with the PDF URL: .../urlScraper? url=https://.../Perfect%20Chocolate%20Chip%20Cookies%20Recipe%20-%20NYT%20Cooking.pdf

The Wolfram Cloud's APIFunction receives the value of the url parameter, which is the full URL string (including the %20).

We apply URLDecode to the input #url This step is crucial because it ensures that any extraneous encoding or malformed characters introduced by the client's API call are removed, leaving a clean, valid URL string for the next function.

The cleaned URL string is then passed to Import, which now successfully recognizes the resource and retrieves the plaintext content of the PDF.

Deployment

Redeploy the corrected function using the same deployment command:

CloudDeploy[urlScraperFixed, "urlScraper", Permissions -> "Public", CloudObjectNameFormat -> "CloudUserUUID"]

This fixed API function should now successfully handle both simple web URLs and complex URLs like your PDF link when called from outside the Wolfram Language.

POSTED BY: Rob Pacey
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard