PHP – Look for similar expressions (similarities) in a list of strings

How to look for similar expressions (similarities) in a list of strings in php

Source code:

$list = [
'a' => 'Ligula, potenti elementum aenean incididunt velit ullamco leo etiam. ',
'b' => 'Sollicitudin habitasse fugiat ante aptent, vitae facilisis varius netus id porta.',
'c' => 'Vitae vivamus posuere ad commodo et cubilia mattis et quisque.',
'd' => 'Euismod donec primis convallis laborum diam ultrices dolor ut suscipit ad incididunt facilisis, tristique mattis velit',
'demo1' => 'Clas conubia molestie elite ultriciesus laoret neque veniam, fringilus risus.',
'demo2' => 'Class conubia molestie elit ultricies laoreet neque veniam, fringilla risus.',
'demo3' => 'Molestie elit ultricies laoreet',
'x' => 'Massa himenaeos diam.',
];

$phrase = "Fringilla risus, class conubia molestie elit ultricies laoreet neque veniam. ";

/** 
 * Look for similar expressions (similarities) in a list of expressions or strings
 * $needle : string; the phrase you are looking for
 * $haystack : the target array in which the phrase is searched, the elements must be string, otherwise the function will give Waring error !
 * $maxhit : It can be used to get an idea of how close it is to accuracy, comparing with result from returned list !
 * $minhit : value under $minhit can be ignored, the chance of similarity is very small;
 * ########
 * result is non-linear/ divergent series, sorted in descending order.
 * The bigger the number the better similarity
 */
function search_similarity(string $needle, array $haystack, &$maxhit=0, &$minhit=0) {
  $len = strlen($needle);

  $maxhit = $len*($len+1)/2; 
  $minhit = round ( ($len/4)*(($len/4)+1)/2 );

  $hit_list=[]; 
  foreach($haystack as $key=>$val) {
    $i=0;
    $hits = 0;
    $s = '';      
    while ($i < $len) { 
      $s .= $needle[$i];        
      $pos = strpos($val, $s);        
      if ($pos === false) 
        $s = '';      
      $hits += strlen($s);         
      $i++;        
    }
    $hit_list[$key] = $hits;
  }
  arsort($hit_list);
  return $hit_list;
}
  
$similarity = search_similarity($phrase, $list, $maxhit, $minhit);

echo '<pre>';
echo 'LOOK for: '.PHP_EOL.$phrase;
echo PHP_EOL.PHP_EOL;
echo 'SEARCH in LIST: ';
print_r($list);
echo PHP_EOL;
echo 'SEARCH RESULT: ';
print_r($similarity);
echo PHP_EOL;
echo 'maxhit='.$maxhit.'; minhit='.$minhit;

 

Result example:

LOOK for: 
Fringilla risus, class conubia molestie elit ultricies laoreet neque veniam. 

SEARCH in LIST: Array
(
    [a] => Ligula, potenti elementum aenean incididunt velit ullamco leo etiam. 
    [b] => Sollicitudin habitasse fugiat ante aptent, vitae facilisis varius netus id porta.
    [c] => Vitae vivamus posuere ad commodo et cubilia mattis et quisque.
    [d] => Euismod donec primis convallis laborum diam ultrices dolor ut suscipit ad incididunt facilisis, tristique mattis velit
    [demo1] => Clas conubia molestie elite ultriciesus laoret neque veniam, fringilus risus.
    [demo2] => Class conubia molestie elit ultricies laoreet neque veniam, fringilla risus.
    [demo3] => Molestie elit ultricies laoreet
    [x] => Massa himenaeos diam.
)

SEARCH RESULT: Array
(
    [demo2] => 1705
    [demo3] => 493
    [demo1] => 459
    [d] => 99
    [a] => 77
    [c] => 60
    [b] => 56
    [x] => 43
)

maxhit=3003; minhit=195

 

byrev Written by:

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *