PHP过滤中禁用单词的文本

本教程将介绍PHP过滤中禁用单词的文本的处理方法,这篇教程是从别的地方看到的,然后加了一些国外程序员的疑问与解答,希望能对你有所帮助,好了,下面开始学习吧。

PHP过滤中禁用单词的文本 教程 第1张

问题描述

我们有一个C2C网站,我们不鼓励在我们的网站上销售品牌产品。我们已经建立了Nike和D&G等品牌词汇的数据库,并制定了一种算法来过滤这些词汇的产品信息,如果产品包含这些词汇,则禁用产品。

我们当前的算法从提供的文本中删除所有空格和特殊字符,并将文本与数据库中的单词进行匹配。这些情况需要通过算法来捕获,并且能够高效捕获:

    我是耐克世界

    我有一双宜家鞋

    我有一些问题

    我卖iphone外壳

    我卖iPhone外壳

    您可以拥有iPhone

现在的问题是它还捕获以下内容:

    快速服装厂(D&G)

    Rosnik Electronics(用于耐克)

怎么在保持捕获真实案例的效率的同时防止此类错误匹配?

编辑

以下是为更好地理解代码的人准备的代码:

$orignal_txt = preg_replace('/&.{0,}?;/', '', (strip_tags($orignal_txt)));
$orignal_txt_nospace = preg_replace('/W/', '', $orignal_txt);
{
 $qry_kws = array("nike", "iphone", "d&g");
 foreach($qry_kws as $rs_kw)
 { 
  $no_space_db_kw = preg_replace('/W/', '', $rs_kw);
  if(stristr($orignal_txt_nospace, $rs_kw))
  {
$ipr_banned_keywords[] = strtolower($rs_kw);
  }
  else if(stristr($orignal_txt_nospace, $no_space_db_kw))
  {
 $ipr_banned_keywords[] = strtolower($rs_kw);
  }

 }
}

推荐答案

只是随便玩玩.(不用于生产)

$data = array(
  "i am nike world",
  "i have n ikee shoes",
  "i have nikeeshoes",
  "i sell i-phone casings",
  "i sell iphone-casings",
  "you can have iphone",
  "rapiD Garment factor",
  "rosNIK Electronics",
  "Buy you self N I K E",
  "B*U*Y I*P*H*O*N*E BABY",
  "My Phone Is not available");


$ban = array("nike","d&g","iphone");

示例1:

$filter = new BrandFilterIterator($data);
$filter->parseBan($ban);
foreach ( $filter as $word ) {
 echo $word, PHP_EOL;
}

输出%1

rapiD Garment factor
rosNIK Electronics
My Phone Is not available

示例2

$filter = new BrandFilterIterator($data,true); //reverse filter
$filter->parseBan($ban);
foreach ( $filter as $word ) {
 echo $word, " " , json_encode($word->getBan()) ,  PHP_EOL;
}

输出2

i am nike world ["nike"]
i have n ikee shoes ["nike"]
i have nikeeshoes ["nike"]
i sell i-phone casings ["iphone"]
i sell iphone-casings ["iphone"]
you can have iphone ["iphone"]
Buy you self N I K E ["nike"]
B*U*Y I*P*H*O*N*E BABY ["iphone"]

使用的类

class BrandFilterIterator extends FilterIterator {
 private $words = array();
 private $reverse = false;

 function __construct(array $words, $reverse = false) {
  $this->reverse = $reverse;
  foreach ( $words as $word ) {
$this->words[] = new Word($word);
  }
  parent::__construct(new ArrayIterator($this->words));
 }

 function parseBan(array $ban) {
  foreach ( $ban as $item ) {
foreach ( $this->words as $word ) {
 $word->checkMetrix($item);
}
  }
 }

 public function accept() {
  if ($this->reverse) {
return $this->getInnerIterator()->current()->accept() ? false : true;
  }
  return $this->getInnerIterator()->current()->accept();
 }
}


class Word {
 private $ban = array();
 private $word;
 private $parts;
 private $accept = true;

 function __construct($word) {
  $this->word = $word;
  $this->parts = explode(" ", $word);
 }

 function __toString() {
  return $this->word;
 }

 function getTrim() {
  return preg_replace('/W/', '', $this->word);
 }

 function accept() {
  return $this->accept;
 }

 function getBan() {
  return array_unique($this->ban);
 }

 function reject($ban = null) {
  $ban === null or $this->ban[] = $ban;
  $this->accept = false;
  return $this->accept;
 }

 function checkMetrix($ban) {
  foreach ( $this->parts as $part ) {
$part = strtolower($part);
$ban = strtolower($ban);
$t = ceil(strlen(strtolower($ban)) / strlen($part) * 100);
$s = similar_text($part, $ban, $p);
$l = levenshtein($part, $part);
if (ceil($p) >= $t || ($t == 100 && $p >= 75 && $l == 0)) {
 $this->reject($ban);
}
  }
  // Detect Bad Use of space
  if (ceil(strlen($this->getTrim()) / strlen($this->word) * 100) < 75) {
if (stripos($this->getTrim(), $ban) !== false) {
 $this->reject($ban);
}
  }
  return $this->accept;
 }
}

好了关于PHP过滤中禁用单词的文本的教程就到这里就结束了,希望趣模板源码网找到的这篇技术文章能帮助到大家,更多技术教程可以在站内搜索。