welcome to learn2easy
  1. Home
  2. Popular Tutorials
  3. Details of the content

A Practical Guide to Calculating Text Similarity with PHP

2025-06-30 8

A Practical Guide to Calculating Text Similarity with PHP

In modern development, measuring text similarity is a common issue whether for search engine optimization, data deduplication, or content recommendation systems, understanding how to calculate text similarity in PHP is an essential skill. This tutorial will walk you through the steps to implement a concise yet effective text similarity function.

Table of Contents

  1. What is Text Similarity?
  2. Overview of Similarity Calculation Methods
    • i. Edit Distance
    • ii. Cosine Similarity
    • iii. Jaccard Similarity Coefficient
  3. Preparing Your PHP Environment
  4. Implementing Text Similarity Functions
    • i. Edit Distance Function Implementation
    • ii. Cosine Similarity Function Implementation
    • iii. Jaccard Similarity Coefficient Function Implementation
  5. Practical Application Case Studies
  6. Summary and Future Prospects

1. What is Text Similarity?

Text similarity measures the degree of similarity between two text segments. A high similarity means the texts are closely related; a low similarity indicates significant differences. Text similarity is often applied in search engine optimization, recommendation systems, and data deduplication.

2. Overview of Similarity Calculation Methods

There are three common algorithms for calculating text similarity:

i. Edit Distance

Edit Distance, also known as Levenshtein distance, calculates the minimum number of edits required to transform one string into another. Edit operations include inserting, deleting, and replacing characters.

ii. Cosine Similarity

Cosine similarity primarily measures the similarity between two vectors, often represented as term frequencies and computes the cosine value to determine similarity.

iii. Jaccard Similarity Coefficient

Jaccard Similarity compares the similarity of two sets using the formula: J(A, B) = |A ∩ B| / |A ∪ B|.

3. Preparing Your PHP Environment

Before we start, make sure your development environment is set up with PHP and the necessary supporting packages. You can test locally or use an online editor. PHP 7.0 or later is recommended.

4. Implementing Text Similarity Functions

Next, we'll implement the three text similarity algorithms mentioned above, starting with Edit Distance:

i. Edit Distance Function Implementation

<?phpfunction levenshteinDistance($str1, $str2) {    $len1 = strlen($str1);    $len2 = strlen($str2);    $matrix = array();    for ($i = 0; $i <= $len1; $i++) {        $matrix[$i][0] = $i;    }    for ($j = 0; $j <= $len2; $j++) {        $matrix[0][$j] = $j;    }    for ($i = 1; $i <= $len1; $i++) {        for ($j = 1; $j <= $len2; $j++) {            $cost = ($str1[$i - 1] == $str2[$j - 1]) ? 0 : 1;            $matrix[$i][$j] = min(                $matrix[$i - 1][$j] + 1,                $matrix[$i][$j - 1] + 1,                $matrix[$i - 1][$j - 1] + $cost            );        }    }    return $matrix[$len1][$len2];}?>

ii. Cosine Similarity Function Implementation

<?phpfunction cosineSimilarity($str1, $str2) {    $vector1 = array_count_values(str_word_count($str1, 1));    $vector2 = array_count_values(str_word_count($str2, 1));    $intersection = array_intersect_key($vector1, $vector2);    $numerator = 0;    foreach ($intersection as $word => $count) {        $numerator += $count * $vector2[$word];    }    $sum1 = array_sum(array_map(function($v) { return $v * $v; }, $vector1));    $sum2 = array_sum(array_map(function($v) { return $v * $v; }, $vector2));    $denominator = sqrt($sum1) * sqrt($sum2);    return ($denominator == 0) ? 0 : $numerator / $denominator;}?>

iii. Jaccard Similarity Coefficient Function Implementation

<?phpfunction jaccardSimilarity($str1, $str2) {    $words1 = array_unique(explode(' ', $str1));    $words2 = array_unique(explode(' ', $str2));    $intersection = count(array_intersect($words1, $words2));    $union = count(array_unique(array_merge($words1, $words2)));    return ($union == 0) ? 0 : $intersection / $union;}?>

5. Practical Application Case Studies

Next, we'll demonstrate how to apply these functions. Suppose we have the following two text segments:

$text1 = "PHP is a widely-used open-source scripting language.";$text2 = "PHP is a popular open-source scripting programming language.";

We can use these three functions to compute their similarities:

echo "Edit Distance: " . levenshteinDistance($text1, $text2) . " <br>";echo "Cosine Similarity: " . cosineSimilarity($text1, $text2) . " <br>";echo "Jaccard Similarity: " . jaccardSimilarity($text1, $text2) . " <br>";

6. Summary and Future Prospects

This article introduced how to write text similarity functions in PHP, covering Edit Distance, Cosine Similarity, and Jaccard Similarity Coefficient. We hope this tutorial has been helpful for your applications in text similarity.

In the future, we can explore more text processing techniques and apply them to more complex projects. Thank you for reading, and we look forward to your feedback!

Home
Hot
Business
Game
Technology
Art Toys