Next you have two functions in PHP to get all word combination from phrase, but the algorithm can be translated into any other language, initially I had it done in delphi/pascal:
function natural_combi_words($phrase){
$arw = array();
$words = explode(' ',trim($phrase));
$nw = count ($words);
for ($i=0; $i<$nw; $i++){
$k ='';
$cnt = 0;
for ($j=$i; $j<($nw); $j++){
$k .= $words[$j].' ';
$arw[$cnt][] = trim($k);
$cnt++;
}
}
return $arw;
}
function natural_combi_words2($phrase){
$combi_source = natural_combi_words($phrase);
$combi_words = $combi_source[0];
$nw = count ($combi_source);
for ($i=1; $i<$nw; $i++){
$combi_words = array_merge($combi_words, $combi_source[$i]);
}
return $combi_words;
}
$phrase = "I have a dream that one day";
print_r(natural_combi_words($phrase));
print_r(natural_combi_words2($phrase));
if phrase is: I have a dream that one day , function natural_combi_words2() returns:
Array
(
[0] => I
[1] => have
[2] => a
[3] => dream
[4] => that
[5] => one
[6] => day
[7] => I have
[8] => have a
[9] => a dream
[10] => dream that
[11] => that one
[12] => one day
[13] => I have a
[14] => have a dream
[15] => a dream that
[16] => dream that one
[17] => that one day
[18] => I have a dream
[19] => have a dream that
[20] => a dream that one
[21] => dream that one day
[22] => I have a dream that
[23] => have a dream that one
[24] => a dream that one day
[25] => I have a dream that one
[26] => have a dream that one day
[27] => I have a dream that one day
)
and function natural_combi_words() returns multi-dimensional arrays like this:
Array
(
[0] => Array
(
[0] => I
[1] => have
[2] => a
[3] => dream
[4] => that
[5] => one
[6] => day
)
[1] => Array
(
[0] => I have
[1] => have a
[2] => a dream
[3] => dream that
[4] => that one
[5] => one day
)
[2] => Array
(
[0] => I have a
[1] => have a dream
[2] => a dream that
[3] => dream that one
[4] => that one day
)
[3] => Array
(
[0] => I have a dream
[1] => have a dream that
[2] => a dream that one
[3] => dream that one day
)
[4] => Array
(
[0] => I have a dream that
[1] => have a dream that one
[2] => a dream that one day
)
[5] => Array
(
[0] => I have a dream that one
[1] => have a dream that one day
)
[6] => Array
(
[0] => I have a dream that one day
)
)
I used this solution back in 2003, in a video player for movies, which looks for subtitles for a movie in subtitles folder and even if the name of the movie was not 100% identical to the subtitle. The program always chooses the right subtitle if there is one!
Example:
video file:
Laughing.out.Loud.2009.1080p.Blu-ray.REMUX.AVC.DTS-HD.MA.5.1-playBD.avi
the subtitle chosen from thousands of other subtitles in the same folder:
LOL.(Laughing.out.Loud).2009.BDRip.XviD.HORiZON-ArtSubs.ENG-RO.srt
Practically using this algorithm you can calculate a rating for all scanned subtitles and which has the highest value, that is chosen, but you can choose 2-3 with maximum (valid) ratings not just one.
Of course, the rating must be calculated using multi-dimensional arrays on each group, eliminated on the very small ones, etc … the final solution is laborious and looks like an A.I.
Be First to Comment