Message Boards Message Boards

0
|
1300 Views
|
1 Reply
|
0 Total Likes
View groups...
Share
Share this post:

Problem formatting imported URL plaintext

Posted 10 months ago

Greetings:

I created a simple Wolfram Cloud API that Imports[] a URL and returns the Plaintext:

ClearAll[urlScraperTest2]
urlScraperTest2 = APIFunction[
   {"url" -> "String"},
   Import[#url, "Plaintext", CharacterEncoding -> "UTF-8"] &];
CloudDeploy[urlScraperTest2, "urlScraperTest2", 
 Permissions -> "Public", CloudObjectNameFormat -> "CloudUserUUID"]

CloudObject["https://www.wolframcloud.com/obj/user-7d8eeec4-69a3-4149-\
bc75-10de306a3cbd/urlScraperTest2"]

I was testing the API using a common recipe website: https://anewsletter.alisoneroman.com/p/honk-if-you-love-stuffing

Within Mathematica ("14.0.0 for Microsoft Windows (64-bit) (December 13, 2023)"), the API works as expected. All of the formatting is correct as seen on the webpage.

In testing from an empty Chrome browser window directly, i.e., https://www.wolframcloud.com/obj/user-7d8eeec4-69a3-4149-bc75-10de306a3cbd/urlScraperTest2?url=https://anewsletter.alisoneroman.com/p/honk-if-you-love-stuffing, I get a result full of escape characters. Here are the first few lines of the output:

"a newsletter\n\nSubscribe Sign in\n\nShare this post\n\n\nHonk If You Love Stuffing\nanewsletter.alisoneroman.com\n\nCopy link\n\nFacebook\n\nEmail\n\nNote\n\nOther\nHonk If You Love Stuffing honk\n\nAlison Roman\n\nNov 9, 2023\n:2219 Paid\n\n136\n\nShare this post\n\n\nHonk If You Love Stuffing\nanewsletter.alisoneroman.com\n\nCopy link\n\nFacebook\n\nEmail\n\nNote\n\nOther\n21\n\nShare\n\n\n\nHello and welcome to Thanksgiving Week on A Newsletter. For the first installment, head HERE . If you[CloseCurlyQuote]ve found your way over by some miracle but are not yet subscribed, let me help you with that:\n\nSubscribe\n\nStuffing is the most important thing on the table.\n\n\nIs that[Ellipsis]..round stuffing? photo by Chris Bernabeo\nIf you don[CloseCurlyQuote]t agree, I can[CloseCurlyQuote]t relate. While I could write a book on the subject (scroll down[LongDash] I sort of did!) TLDR; The stuffing does not go into the bird (ever). The stuffing does not have meat, though you could add it (sausage, bacon). The stuffing has not been tested with gluten-free bread, though I[CloseCurlyQuote]m sure it would be good if you[CloseCurlyQuote]re accustomed to gluten-free bread. And since you[CloseCurlyQuote]ve asked, I do not have a cornbread stuffing recipe though I do have a great cornbread recipe (it[CloseCurlyQuote]s in Sweet Enough )"

Note: Try the URL given above to see the full output

Is it possible to have the API return the result formatted as it appears in a Mathematica notebook? I am presuming this is an issue with character encoding but I haven't had success in decoding it. For background, I am using this API for a workflow in a program outside of Mathematica, and the proper formatting is an important issue.

Thanks!

POSTED BY: Mark Coleman
Posted 10 months ago

It's too long so I only post short output.

In[1]:= ToExpression[Import["https://www.wolframcloud.com/obj/user-7d8eeec4-69a3-4149-bc75-10de306a3cbd/urlScraperTest2?url=https://anewsletter.alisoneroman.com/p/honk-if-you-love-stuffing"]]
Out[1]= a newsletter

Subscribe Sign in

Share this post
POSTED BY: Zihan Li
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract