Semantex's text comparison function allows you to discover similar (either by words used or meaning) or duplicate content. Understanding the similarity of two sets of content can be a powerful tool for simple use cases such as search, and more complex ones including migrating content to new platforms, ensuring compliance language is consistent, and identifying opportunities for content rationalization and consolidation within a large document library.

While traditional text comparison algorithms focus on detecting an overlap of words used (syntactic), Semantex's NLP models can also assess similarity based on the meaning of the content (semantic). Application of this powerful capability at scale enables the discovery of similar content that would never be otherwise detected.

What is syntactic comparison? Syntactic comparison detects the similarity of the words and structures used within text. This is useful when trying to find exact or nearly exact matches within your content and is the method of comparison used by most traditional representation-based algorithms.
What is semantic comparison? Semantex's semantic comparison evaluates text based on its meaning even when it's expressed with different words and structure. For example, when comparing “How are you?” with “How old are you?”, a syntactic comparison identifies a content match with a similarity score of 87.5%, but a semantic evaluation reveals these two phrases as not a match.
What is cross-language comparison: Consider the phrases “how are you” and “cómo estás”. While both have nearly identical meanings in their respective languages, traditional representation-based (syntactic) algorithms won't pick up on this similarity. Semantex's cross-language comparison can identify text with similar meaning within your content even when it's expressed in different languages.
