Group Abstract Group Abstract

Message Boards Message Boards

Distances and weights in semantic space

Posted 10 years ago

I am doing LSA and categorization of news. Could you help me with such questions?

  1. What is the most suitable kind of distance d for similarity measurement in semantic space: cosine, Euclidean, Manhattan or other Minkowski distance?
  2. Next for news categorization I am going to use weights w of words which are function of distance to a given point in semantic space. What is the most suitable function w(d) ? Thanks for suggestions.
POSTED BY: Andrii Ey
2 Replies
Posted 10 years ago

Hi Daniel,

Thanks for the answer. SVD gives me just coordinates of words and documents in D-space. I may use any kind of metrics there. Just question is which kind is the most common. I get different results for cosine, Euclidean and Manhattan distances. But it seems cosine is better than other ones.

Andrii

POSTED BY: Andrii Ey

I might be mistaken, but I think if you implement LSA using a principal components methodology (PCA), that will use the singular values decomposition (SVD) under the hood. When the dust settles from all those acronyms, you end up using a Euclidean metric, because that's what SVD gives you. So this flavor of PCA is a least-squares method.

POSTED BY: Daniel Lichtblau
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard