I have looked at a number of example some similar to the ones above. Some word histories tell nice stories, for example about medicine. Here are three words for malaria:
Options[WordFrequencyPlot] = {"YearStart" -> 1800, "YearEnd" -> Now,
"Case" -> True, "Smooth" -> 3, "Scaling" -> None,
"Style" -> Automatic};
WordFrequencyPlot[words_, OptionsPattern[]] :=
With[{$data =
WordFrequencyData[words,
"TimeSeries", {OptionValue["YearStart"], OptionValue["YearEnd"]},
IgnoreCase -> OptionValue["Case"]]},
DateListPlot[
MapThread[
Callout, {MeanFilter[#,
Quantity[OptionValue["Smooth"], "Years"]] & /@ Values[$data],
words}], ScalingFunctions -> OptionValue["Scaling"],
PlotRange -> All, PlotTheme -> "Detailed",
PlotStyle -> OptionValue["Style"], FrameTicks -> {Automatic, None},
ImageSize -> Large, FrameLabel -> {"YEAR", "FREQUENCY in TEXT"}]]
WordFrequencyPlot[{"ague", "malaria", "paludism"}]
data:image/s3,"s3://crabby-images/6fd3e/6fd3e4400429968b0da32e02ed3b9fd45cb973ad" alt="enter image description here"
Until 1880, when Laveran first discovered the parasite, ague and malaria have basically the same frequency. Malaria is derived from malaria aria "bad air", whereas ague comes from acute febris "acute fever".
Sometimes we can also observe a shift in the frequency of words reflecting meaning the same thing
Options[WordFrequencyPlot] = {"YearStart" -> 1800, "YearEnd" -> Now, "Case" -> True, "Smooth" -> 3, "Scaling" -> None, "Style" -> Automatic};
WordFrequencyPlot[{"Moslem", "Muslim"}]
data:image/s3,"s3://crabby-images/c6531/c65312a422f26330148a961f49db07d77b01f70e" alt="enter image description here"
In those cases a relative frequency plot, i.e. displaying quantiles could be interesting:
StackedDateListPlot[
MapThread[
Callout, {Values[
WordFrequencyData[{"Moslem", "Muslim"}, "TimeSeries"]], {"Moslem",
"Muslim"}}], PlotRange -> All, PlotLayout -> "Percentile",
ImageSize -> Large, PlotStyle -> {Red, Green},
LabelStyle -> Directive[Bold, 16], PlotTheme -> "Detailed"]
data:image/s3,"s3://crabby-images/d3091/d3091d1bcc4f8e1710d358f8c627965283a312ef" alt="enter image description here"
Such a plot is also useful to compare opposites like the words peace and war, which are also studied in an earlier post:
StackedDateListPlot[
MapThread[
Callout, {Values[
WordFrequencyData[{"war", "peace"}, "TimeSeries"]], {"war",
"peace"}}], PlotRange -> All, PlotLayout -> "Percentile",
ImageSize -> Large, PlotStyle -> {Red, Green},
LabelStyle -> Directive[Bold, 16], PlotTheme -> "Detailed"]
data:image/s3,"s3://crabby-images/56e36/56e3665014f9e8bece6b01be86e778b6bcb205a7" alt="enter image description here"
It is interesting to see that since about 1850 the word war is more frequent than peace.
These plot also reflect use of words such as bike and bicycle
StackedDateListPlot[
MapThread[
Callout, {Values[
WordFrequencyData[{"bicycle", "bike"}, "TimeSeries"]], {"bicycle",
"bike"}}], PlotRange -> All, PlotLayout -> "Percentile",
ImageSize -> Large, PlotStyle -> {Red, Green},
LabelStyle -> Directive[Bold, 16], PlotTheme -> "Detailed"]
data:image/s3,"s3://crabby-images/08707/087073ad1f77271ff887d4b2dbaf8848576efb2f" alt="enter image description here"
I would have expected that bike becomes more prominent during the 20th century. Between 1800 and 1880 is is also surprisingly common. I am not sure why, but this could be due to the other meaning of bike which is something like "nest or swarm of bees".
It would be interesting to consider the change of meaning of words. I tried to look at the word "gay" which has changed meaning over the years from lighthearted (13th century), bright and showy (14th century) and happy. It could also imply morality and mean gay women (prostitute) or gay man (womaniser), gay house (brothel). around 1900 it was something like "cheerful"; in the 1980 young users would use it to mean "lame, stupid" around 1990 it got to mean homosexual. I tried to use google n-grams to figure that out, but it didn't really work well. Here are words that are used close to gay over the years:
Table[{k,
StringSplit[
StringSplit[
StringSplit[
StringSplit[
URLExecute[
"https://books.google.com/ngrams/graph?content=gay+*_ADJ&\
year_start=" <> ToString[k] <> "&year_end=" <> ToString[k + 20] <>
"&corpus=15&smoothing=3"], "direct_url="][[2]],
" width"][[1]], "gay%20"][[3 ;;]], "_"][[All, 1]]}, {k, 1800,
2000, 20}]
which gives
data:image/s3,"s3://crabby-images/665ba/665ba33e712e65ecae31fbd9c7a2640cfd563a8c" alt="enter image description here"
The frequency plot is:
Options[WordFrequencyPlot] = {"YearStart" -> 1800, "YearEnd" -> Now,
"Case" -> True, "Smooth" -> 3, "Scaling" -> None,
"Style" -> Automatic};
WordFrequencyPlot[{"gay"}]
data:image/s3,"s3://crabby-images/6afaf/6afaff7885e1cb0316fdff7491d14113c9b9e440" alt="enter image description here"
or over longer times:
Options[WordFrequencyPlot] = {"YearStart" -> 1500, "YearEnd" -> Now,
"Case" -> True, "Smooth" -> 3, "Scaling" -> None,
"Style" -> Automatic};
WordFrequencyPlot[{"gay"}]
data:image/s3,"s3://crabby-images/d0c41/d0c41feaf08d19371734179ffd66eef3576a8b68" alt="enter image description here"
Also plastic has changed meaning from the the characteristic of being plastic to the material plastic:
WordFrequencyPlot[{"plastic"}]
data:image/s3,"s3://crabby-images/82469/8246940b3fd6f4dd267fee72779e1b7771d994b4" alt="enter image description here"
In general we can see when different products have been developed:
WordFrequencyPlot[{"radio", "telephone", "computer", "car", "watch",
"electricity"}, "YearStart" -> 1700, "YearEnd" -> Now]
data:image/s3,"s3://crabby-images/dbb6a/dbb6abccb2ff74b7113877f984e3b44198e1902b" alt="enter image description here"
Of course, words can come out of fashion, too. For example:
WordFrequencyPlot[{"Pence", "Dollar", "Shilling", "Euro", "Sterling",
"Farthing", "Florin", "Dime", "Yen", "Yuan"}, "YearStart" -> 1700,
"YearEnd" -> Now]
data:image/s3,"s3://crabby-images/3a43e/3a43e4f1f2a629aabfcf9ee5b0c68fdd79d040ed" alt="enter image description here"
In fact we can study this more systematically, by looking at the correlations between frequency curves:
words = {"war", "peace", "communism", "capitalism", "socialism",
"democracy", "unemployment", "conflict", "crisis", "terrorism",
"military", "welfare", "bomb", "weapons", "combat"}
worddata = (WordFrequencyData[#, "TimeSeries"])["Values"] & /@ words;
cm = Correlation[
Transpose@
worddata[[All,
1 ;; Min[
Table[Length[worddata[[i, ;;]]], {i, 1,
Length[words] - 1}]]]]];
Column[{GraphicsRow[words[[1 ;;]], ImageSize -> 1000, Frame -> All],
Row[{GraphicsColumn[words[[1 ;;]], ImageSize -> 67, Frame -> All],
Overlay[{ArrayPlot[cm,
ColorFunction -> (ColorData["TemperatureMap"][(1 + #)/2] &),
Frame -> None, Mesh -> True, PlotRangePadding -> 0,
ImageSize -> 1000, ColorFunctionScaling -> False],
GraphicsGrid[Map[NumberForm[#, 2] &, cm, {2}],
ImageSize -> 1000]}]}]}, Alignment -> Right, Spacings -> 0]
data:image/s3,"s3://crabby-images/d1f04/d1f04d7069dae183358a0dfa93a9f94c9bf2f1d8" alt="enter image description here"
We can use a BandwidthOrdering
Needs["GraphUtilities`"]
{r, c} = MinimumBandwidthOrdering[cm, Method -> "RCMD"]
cm2 = Correlation[
Transpose@
worddata[[r]][[All,
1 ;; Min[
Table[Length[worddata[[r]][[i, ;;]]], {i, 1,
Length[words]}]]]]];
Column[{GraphicsRow[words[[r]], ImageSize -> 1000, Frame -> All],
Row[{GraphicsColumn[words[[r]], ImageSize -> 67, Frame -> All],
Overlay[{ArrayPlot[cm2,
ColorFunction -> (ColorData["TemperatureMap"][(1 + #)/2] &),
Frame -> None, Mesh -> True, PlotRangePadding -> 0,
ImageSize -> 1000, ColorFunctionScaling -> False],
GraphicsGrid[Map[NumberForm[#, 2] &, cm2, {2}],
ImageSize -> 1000]}]}]}, Alignment -> Right, Spacings -> 0]
data:image/s3,"s3://crabby-images/e7b8f/e7b8f4c5b9bed048bde7997a0b5276bfaed7e556" alt="enter image description here"
Using that we can try to find words with a similar behaviour:
StackedDateListPlot[
MapThread[Callout,
Log /@ {Values[
WordFrequencyData[{"Democracy", "War", "Peace"},
"TimeSeries"]], {"Democracy", "War", "Peace"}}],
PlotRange -> All, PlotLayout -> "Percentile", ImageSize -> Large,
PlotStyle -> {Red, Green, Blue}, LabelStyle -> Directive[Bold, 16],
PlotTheme -> "Detailed"]
data:image/s3,"s3://crabby-images/251f9/251f9634317ba5b5249724d86cbfb0418ef60efe" alt="enter image description here"
which indicates near constant ratios over a long time. This is not that easy to see in the FrequencyPlot
WordFrequencyPlot[{"Democracy", "War", "Peace"}, "YearStart" -> 1900,
"YearEnd" -> Now, "Scaling" -> {None, "Log"}]
.
Finally, it is interesting to look at other languages such as tu vs usted in Spanish
Options[WordFrequencyPlotSpanish] = {"YearStart" -> 1800,
"YearEnd" -> Now, "Case" -> True, "Smooth" -> 3, "Scaling" -> None,
"Style" -> Automatic};
WordFrequencyPlotSpanish[words_, OptionsPattern[]] :=
With[{$data =
WordFrequencyData[words,
"TimeSeries", {OptionValue["YearStart"], OptionValue["YearEnd"]},
IgnoreCase -> OptionValue["Case"], Language -> "Spanish"]},
DateListPlot[
MapThread[
Callout, {MeanFilter[#,
Quantity[OptionValue["Smooth"], "Years"]] & /@ Values[$data],
words}], ScalingFunctions -> OptionValue["Scaling"],
PlotRange -> All, PlotTheme -> "Detailed",
PlotStyle -> OptionValue["Style"], FrameTicks -> {Automatic, None},
ImageSize -> Large, FrameLabel -> {"YEAR", "FREQUENCY in TEXT"}]]
WordFrequencyPlotSpanish[{"vosotros", "ustedes"}]
data:image/s3,"s3://crabby-images/3d76a/3d76a2b30479cd76f8c9f30e0b11f83e2af02c7d" alt="enter image description here"
or Du and Sie in German
Options[WordFrequencyPlotGerman] = {"YearStart" -> 1800,
"YearEnd" -> Now, "Case" -> True, "Smooth" -> 3, "Scaling" -> None,
"Style" -> Automatic};
WordFrequencyPlotGerman[words_, OptionsPattern[]] :=
With[{$data =
WordFrequencyData[words,
"TimeSeries", {OptionValue["YearStart"],
OptionValue["YearEnd"]},(*IgnoreCase\[Rule]OptionValue["Case"],*)
IgnoreCase -> False, Language -> "German"]},
DateListPlot[
MapThread[
Callout, {MeanFilter[#,
Quantity[OptionValue["Smooth"], "Years"]] & /@ Values[$data],
words}], ScalingFunctions -> OptionValue["Scaling"],
PlotRange -> All, PlotTheme -> "Detailed",
PlotStyle -> OptionValue["Style"], FrameTicks -> {Automatic, None},
ImageSize -> Large, FrameLabel -> {"YEAR", "FREQUENCY in TEXT"}]]
WordFrequencyPlotGerman[{"Du", "Sie"}]
I suppose that there are interesting mechanisms working here. It definitely feels that "Du" becomes more prevalent as opposed to the more formal "Sie". But there might be an affect due to (social?) media etc. in the opposite direction.
Here are a couple of pronouns in English:
WordFrequencyPlot[{"you", "thou", "ye", "thee", "thy"},
"YearStart" -> 1200, "YearEnd" -> Now]
data:image/s3,"s3://crabby-images/1e9e8/1e9e8121004040b4beb8fa0809a57f4277756220" alt="enter image description here"
which might look better on a percentile plot:
StackedDateListPlot[
MapThread[
Callout, {Values[
WordFrequencyData[{"you", "thou", "ye", "thee", "thy"},
"TimeSeries"]], {"you", "thou", "ye", "thee", "thy"}}],
PlotRange -> All, PlotLayout -> "Percentile", ImageSize -> Large,
PlotStyle -> RandomColor[5], LabelStyle -> Directive[Bold, 16],
PlotTheme -> "Detailed"]
data:image/s3,"s3://crabby-images/0a5a8/0a5a83f8ed087503a1dd64620534d7bd2ffca6a3" alt="enter image description here"
Logarithmically, this becomes:
StackedDateListPlot[
MapThread[Callout,
Log@{Values[
WordFrequencyData[{"you", "thou", "ye", "thee", "thy"},
"TimeSeries"]], {"you", "thou", "ye", "thee", "thy"}}],
PlotRange -> All, PlotLayout -> "Percentile", ImageSize -> Large,
PlotStyle -> RandomColor[5], LabelStyle -> Directive[Bold, 16],
PlotTheme -> "Detailed"]
data:image/s3,"s3://crabby-images/01991/01991de7c1d299f62ad552aed76a68abb4a4424b" alt="enter image description here"
Cheers,
Marco