Dear Gregory,
I don't know the algorithms but I suppose that the list of stop words the WL uses is:
list = Complement[DictionaryLookup["*"], DeleteStopwords[DictionaryLookup["*"]]]
![enter image description here](http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2016-06-07at21.23.38.png&userId=48754)
I can then also build my own function to delete stop words:
Select[TextWords[ExampleData[{"Text", "AliceInWonderland"}]], ! MemberQ[list, #] &]
It is about as efficient as the original:
Select[TextWords[ExampleData[{"Text", "AliceInWonderland"}]], ! MemberQ[list, #] &]; // RepeatedTiming
(0.69 seconds)
DeleteStopwords[TextWords[ExampleData[{"Text", "AliceInWonderland"}]]]; // RepeatedTiming
(0.74 seconds)
Also it does not give quite the same result
Complement[DeleteStopwords[TextWords[ExampleData[{"Text", "AliceInWonderland"}]]],
Select[TextWords[ExampleData[{"Text", "AliceInWonderland"}]], ! MemberQ[list, #] &]]
![enter image description here](http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2016-06-07at21.30.45.png&userId=48754)
and
Complement[Select[TextWords[ExampleData[{"Text", "AliceInWonderland"}]], ! MemberQ[list, #] &],
DeleteStopwords[TextWords[ExampleData[{"Text", "AliceInWonderland"}]]]]
![enter image description here](http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2016-06-07at21.31.57.png&userId=48754)
The last result suggests that this is better:
Select[TextWords[ExampleData[{"Text", "AliceInWonderland"}]], ! MemberQ[ToLowerCase /@ list, ToLowerCase[#]] &]
as you can easily see:
Complement[Select[TextWords[ExampleData[{"Text", "AliceInWonderland"}]], ! MemberQ[ToLowerCase /@ list, ToLowerCase[#]] &],
DeleteStopwords[TextWords[ExampleData[{"Text", "AliceInWonderland"}]]]]
![enter image description here](http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2016-06-07at22.58.20.png&userId=48754)
So they are not quite the same, but the "words" that are different are few. You also notice that the last output contains many words that finish with 's, which is easy to fix, too.
The little function that we built above can be fed with your favourite word list.
I know that this does not answer your question, but I hope it is useful.
Best wishes,
Marco
PS: I quite enjoy the fact that Mathematica appears to treat all in-laws as stop words.