Hi Vitaliy,
I write a simple n-gram WordCloud function below.
The code quality can still improve, but it works with n-gram idea.
Do you have any suggestion to improve the code quality or effiency?
Attached the notebook format for better understand.
nGramWords[text_String, n_Integer: 4, filterLevel_Integer: 2] :=
Module[{
words = DeleteStopwords[TextWords[ToLowerCase[text]]],
nGramInitial, nGramTable, removeValue, newLine},
seperate[list_] :=
With[{l = Length@list}, {Take[list, l - 1], Take[list, -(l - 1)]}];
nGramInitial =
Normal@Table[
Select[WordCounts[StringRiffle[words],
i], # >= filterLevel &], {i, n, 1, -1}];
nGramInitial =
Join[Drop[
nGramInitial, -1], {({#[[1]]} -> #[[2]]) & /@
Last@nGramInitial}];
nGramTable = {};
nGramTable = Append[nGramTable, First@nGramInitial];
Do[
removeValue =
Flatten@Table[
Thread[Rule[seperate[First@Last[nGramTable][[i]]],
Table[-Last@Last[nGramTable][[i]], {2}]]], {i, 1,
Length@Last[nGramTable]}];
newLine =
Sort[Select[
Flatten@If[
Length[#] >
1, #[[1]][[1]] -> (#[[1]][[2]] + #[[2]][[2]]) , #[[
1]]] & /@
GatherBy[Join[removeValue, nGramInitial[[j]]],
First], #[[2]] >= 1 &], #1[[2]] > #2[[2]] &];
nGramTable = Append[nGramTable, newLine], {j, 2, n}];
Sort[{StringRiffle[#[[1]]], #[[2]]} & /@
Select[Flatten[nGramTable], #[[2]] >= filterLevel &], #1[[
2]] > #2[[2]] &]
];

Attachments: