有几句话要处理,例如“糟糕的食物,糟糕的服务,不好的地方”。如果将“可怕”和“可怕”一并计算在内,则也可以删除标点符号。这些可以通过正则表达式处理。同样,如果几个单词具有相同的计数,则应分别显示。以下回答是肯定的。
with review as
-- CTE (Oracle: Subquery Factoring) for test data. TO BE Replaced by actual table.
(select 'Terrible food,terrible service,bad,bad place' || chr(13) || chr(10) || 'Just stay away!!' review_text from dual),review_words as
-- Strip target string of Punctuation and Control characters,also reduce multiple spaces to single space
(select regexp_replace(regexp_replace(review_text,'[[:punct:][:cntrl:]]',' '),'\s{2,}',' ') rwords
from review
),word_list as
-- Now from result of above the individual words and convert to lower case.
( select lower(regexp_substr(rwords,'[^ ]+',1,rownum)) words
from review_words connect by level <= regexp_count(rwords,' ')
)
-- get each word and count highest ranked words.
select word,cnt
from ( -- Rank the Word count
select word,cnt,rank() over(order by cnt desc) rnk
from (-- get the number of occurrence of eah word.
select words word,count(*) cnt
from word_list
group by words
)
)
where rnk = 1;
在此处查看fiddle。
,
也许这样会有所帮助...
我找到了这个ARTICLE
我已经对其进行了一些重新配置,以便在clob列中找到最常见的电子邮件。这是查询:
with emails as (
select
cast(trim(
regexp_substr(t.toaddress,'[^,]+',levels.column_value)
) as varchar2(320)) as email,domain,id
from t,table(cast(multiset(
select level from dual
connect by level <= length (regexp_replace(t.toaddress,]+')) + 1
) as sys.OdciNumberList)) levels
)
select email from (select email,count(email) ce
from emails
group by email) where ce = (select max(ce)
from (select email,count(email) ce
from emails
group by email));
这是DEMO
但是所有的荣耀归于 康纳·麦克唐纳
本文链接:https://www.f2er.com/3164587.html