Group Abstract

Message Boards

WOLFRAM COMMUNITY

2.8K Views

3 Replies

2 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Wolfram Language

How do I delete all characters after a particular part of a string?

Roger J Brown

Roger J Brown, University of Maryland

Posted 2 years ago

After culling URL data from the Web, I am left with a list of strings. This code StringDrop[StringTake[names[[#]], 100], 22] & /@ {4, 5, 6} Works fine to produce this result (only three are shown) of movie titles and year of release along with other information I do not need. The first 22 characters are always the same so I can StringDrop them. But I want to get rid of everything after the year in parens. ( {"Nothing But the Truth (2008) - IMDb</title><meta name=\"description\\" content=\"N", "In the Loop (2009) - IMDb</title><meta \name=\"description\" content=\"In the Loop", "Getting to Know You (2020) - IMDb</title><meta name=\"description\" content=\"Get"} ) The only thing I care about are the names and the date released. Of course the length of each string with that information is different. StringReplacePart and others require a constant number of characters. How can I tell Mathematica to drop everything after (xxxx) where xxxx is the year of release? Thanks

POSTED BY: Roger J Brown

3 Replies

Sort By:

Roger J Brown

Roger J Brown, University of Maryland

Posted 2 years ago

I had just gotten this to work when I received notification of your replies (len = length of list) shtStr = StringDrop[StringTake[names[[#]], 100], 22] & /@ Range[len]; StringDrop[ shtStr[[#]], -(81 - StringPosition[shtStr[[#]], {"IMDb"}][[1, 1]])] & /@ Range[len]

POSTED BY: Roger J Brown

Eric Rimbey

Posted 2 years ago

data = {"Nothing But the Truth (2008) - IMDb</title><meta name=\"description\" content=\"N", "In the Loop (2009)-IMDb</title><meta name=\"description\" content=\"In the Loop", "Getting to Know You (2020)-IMDb</title><meta name=\"description\" content=\"Get"}; StringReplace[data, RegularExpression["(\$\\d{4,4}\$)."] -> "$1"] `{"Nothing But the Truth (2008)", "In the Loop (2009)", "Getting to Know You (2020)"}` This adds a bit more robustness by matching on the parenthesized year (so parentheses in the actual movie title are more likely to be preserved. For example: StringReplace[ "(500) Days of Summer (2009) - IMDb</title><meta name=\"description\" content=\"N", RegularExpression["(\$\\d{4,4}\$)."] -> "$1"] `(500) Days of Summer (2009)` Now, had that been (5000) Days of Summer, then we'd need to add more to our pattern matcher. Rather than dropping the first 22 characters and doing string matching, you might instead want to treat this as structured data (is it XML?) and extract the title and year directly from the tags.

POSTED BY: Eric Rimbey

Bill Nelson

Posted 2 years ago

Try this list={"Nothing But the Truth (2008) - IMDb</title><meta name=\"description\" content=\"N\"", "In the Loop (2009) - IMDb</title><meta name=\"description\" content=\"In the Loop\"", "Getting to Know You - IMDb</title><meta name=\"description\" content=\"Get\""}; Map[StringTake[#,StringPosition[#,")"][[1,1]]]&,list] which returns the list {"Nothing But the Truth (2008)","In the Loop (2009)","Getting to Know You (2020)"}

Try this

    list={"Nothing But the Truth (2008) - IMDb</title><meta name=\"description\" content=\"N\"", 
      "In the Loop (2009) - IMDb</title><meta name=\"description\" content=\"In the Loop\"", 
      "Getting to Know You - IMDb</title><meta name=\"description\" content=\"Get\""};
    Map[StringTake[#,StringPosition[#,")"][[1,1]]]&,list]

which returns the list

    {"Nothing But the Truth (2008)","In the Loop (2009)","Getting to Know You (2020)"}

POSTED BY: Bill Nelson

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback