How to look for similar expressions (similarities) in a list of strings in php
Source code:
$list = [
'a' => 'Ligula, potenti elementum aenean incididunt velit ullamco leo etiam. ',
'b' => 'Sollicitudin habitasse fugiat ante aptent, vitae facilisis varius netus id porta.',
'c' => 'Vitae vivamus posuere ad commodo et cubilia mattis et quisque.',
'd' => 'Euismod donec primis convallis laborum diam ultrices dolor ut suscipit ad incididunt facilisis, tristique mattis velit',
'demo1' => 'Clas conubia molestie elite ultriciesus laoret neque veniam, fringilus risus.',
'demo2' => 'Class conubia molestie elit ultricies laoreet neque veniam, fringilla risus.',
'demo3' => 'Molestie elit ultricies laoreet',
'x' => 'Massa himenaeos diam.',
];
$phrase = "Fringilla risus, class conubia molestie elit ultricies laoreet neque veniam. ";
/** 
 * Look for similar expressions (similarities) in a list of expressions or strings
 * $needle : string; the phrase you are looking for
 * $haystack : the target array in which the phrase is searched, the elements must be string, otherwise the function will give Waring error !
 * $maxhit : It can be used to get an idea of how close it is to accuracy, comparing with result from returned list !
 * $minhit : value under $minhit can be ignored, the chance of similarity is very small;
 * ########
 * result is non-linear/ divergent series, sorted in descending order.
 * The bigger the number the better similarity
 */
function search_similarity(string $needle, array $haystack, &$maxhit=0, &$minhit=0) {
  $len = strlen($needle);
  $maxhit = $len*($len+1)/2; 
  $minhit = round ( ($len/4)*(($len/4)+1)/2 );
  $hit_list=[]; 
  foreach($haystack as $key=>$val) {
    $i=0;
    $hits = 0;
    $s = '';      
    while ($i < $len) { 
      $s .= $needle[$i];        
      $pos = strpos($val, $s);        
      if ($pos === false) 
        $s = '';      
      $hits += strlen($s);         
      $i++;        
    }
    $hit_list[$key] = $hits;
  }
  arsort($hit_list);
  return $hit_list;
}
  
$similarity = search_similarity($phrase, $list, $maxhit, $minhit);
echo '<pre>';
echo 'LOOK for: '.PHP_EOL.$phrase;
echo PHP_EOL.PHP_EOL;
echo 'SEARCH in LIST: ';
print_r($list);
echo PHP_EOL;
echo 'SEARCH RESULT: ';
print_r($similarity);
echo PHP_EOL;
echo 'maxhit='.$maxhit.'; minhit='.$minhit;
Result example:
LOOK for: 
Fringilla risus, class conubia molestie elit ultricies laoreet neque veniam. 
SEARCH in LIST: Array
(
    [a] => Ligula, potenti elementum aenean incididunt velit ullamco leo etiam. 
    [b] => Sollicitudin habitasse fugiat ante aptent, vitae facilisis varius netus id porta.
    [c] => Vitae vivamus posuere ad commodo et cubilia mattis et quisque.
    [d] => Euismod donec primis convallis laborum diam ultrices dolor ut suscipit ad incididunt facilisis, tristique mattis velit
    [demo1] => Clas conubia molestie elite ultriciesus laoret neque veniam, fringilus risus.
    [demo2] => Class conubia molestie elit ultricies laoreet neque veniam, fringilla risus.
    [demo3] => Molestie elit ultricies laoreet
    [x] => Massa himenaeos diam.
)
SEARCH RESULT: Array
(
    [demo2] => 1705
    [demo3] => 493
    [demo1] => 459
    [d] => 99
    [a] => 77
    [c] => 60
    [b] => 56
    [x] => 43
)
maxhit=3003; minhit=195
Be First to Comment