Group Abstract Group Abstract

Message Boards Message Boards

0
|
2.8K Views
|
3 Replies
|
2 Total Likes
View groups...
Share
Share this post:

How do I delete all characters after a particular part of a string?

After culling URL data from the Web, I am left with a list of strings. This code

StringDrop[StringTake[names[[#]], 100], 22] & /@ {4, 5, 6}

Works fine to produce this result (only three are shown) of movie titles and year of release along with other information I do not need. The first 22 characters are always the same so I can StringDrop them. But I want to get rid of everything after the year in parens.

*(
{"Nothing But the Truth (2008) - IMDb</title><meta name=\"description\\" content=\"N", 
"In the Loop (2009) - IMDb</title><meta \name=\"description\" content=\"In the Loop", 
"Getting to Know You (2020) - IMDb</title><meta name=\"description\" content=\"Get"}
*)

The only thing I care about are the names and the date released. Of course the length of each string with that information is different. StringReplacePart and others require a constant number of characters. How can I tell Mathematica to drop everything after (xxxx) where xxxx is the year of release?

Thanks

POSTED BY: Roger J Brown
3 Replies

I had just gotten this to work when I received notification of your replies (len = length of list)

shtStr = StringDrop[StringTake[names[[#]], 100], 22] & /@ Range[len];
StringDrop[
   shtStr[[#]], -(81 - 
      StringPosition[shtStr[[#]], {"IMDb"}][[1, 1]])] & /@ Range[len]
POSTED BY: Roger J Brown
Posted 2 years ago
data =
  {"Nothing But the Truth (2008) - IMDb</title><meta name=\"description\" content=\"N",
   "In the Loop (2009)-IMDb</title><meta name=\"description\" content=\"In the Loop",
   "Getting to Know You (2020)-IMDb</title><meta name=\"description\" content=\"Get"};
StringReplace[data, RegularExpression["(\\(\\d{4,4}\\)).*"] -> "$1"]

{"Nothing But the Truth (2008)", "In the Loop (2009)", "Getting to Know You (2020)"}

This adds a bit more robustness by matching on the parenthesized year (so parentheses in the actual movie title are more likely to be preserved. For example:

StringReplace[
    "(500) Days of Summer (2009) - IMDb</title><meta name=\"description\" content=\"N", 
    RegularExpression["(\\(\\d{4,4}\\)).*"] -> "$1"]

(500) Days of Summer (2009)

Now, had that been (5000) Days of Summer, then we'd need to add more to our pattern matcher.

Rather than dropping the first 22 characters and doing string matching, you might instead want to treat this as structured data (is it XML?) and extract the title and year directly from the tags.

POSTED BY: Eric Rimbey
Posted 2 years ago

Try this

    list={"Nothing But the Truth (2008) - IMDb</title><meta name=\"description\" content=\"N\"", 
      "In the Loop (2009) - IMDb</title><meta name=\"description\" content=\"In the Loop\"", 
      "Getting to Know You - IMDb</title><meta name=\"description\" content=\"Get\""};
    Map[StringTake[#,StringPosition[#,")"][[1,1]]]&,list]

which returns the list

    {"Nothing But the Truth (2008)","In the Loop (2009)","Getting to Know You (2020)"}
POSTED BY: Bill Nelson
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard