1. 程式人生 > >leetcode-819-Most Common Word(詞頻統計)

leetcode-819-Most Common Word(詞頻統計)

may graph after ons most p s size nor 累加

題目描述:

Given a paragraph and a list of banned words, return the most frequent word that is not in the list of banned words. It is guaranteed there is at least one word that isn‘t banned, and that the answer is unique.

Words in the list of banned words are given in lowercase, and free of punctuation. Words in the paragraph are not case sensitive. The answer is in lowercase.

Example:
Input: 
paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ["hit"]
Output: "ball"
Explanation: 
"hit" occurs 3 times, but it is a banned word.
"ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. 
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as "ball,"), 
and that "hit" isn‘t the answer even though it occurs more because it is banned.

Note:

  • 1 <= paragraph.length <= 1000.
  • 1 <= banned.length <= 100.
  • 1 <= banned[i].length <= 10.
  • The answer is unique, and written in lowercase (even if its occurrences in paragraph may have uppercase symbols, and even if it is a proper noun.)
  • paragraph only consists of letters, spaces, or the punctuation symbols !?‘,;.
  • Different words in paragraph are always separated by a space.
  • There are no hyphens or hyphenated words.
  • Words only consist of letters, never apostrophes or other punctuation symbols.

要完成的函數:

string mostCommonWord(string paragraph, vector<string>& banned)

說明:

1、這道題目給定一個字符串,裏面是一個句子,包含了字母(大小寫都有)和空格、符號,還給了一個禁用詞的vector(小寫),要求我們對字符串裏面的單詞做詞頻分析,找到出現次數最多的單詞,返回這個單詞。

2、明白題意,這道題很容易做,也是一道工程類題目。

首先,對字符串中的字符逐個判斷,如果是字母,轉化為小寫形式,記錄位置為 i ,繼續處理下一個,直到元素不是字母,記錄位置 j ,把 i 到 j -1的子字符串放在vector中。

然後,對vector中的單詞逐個判斷,如果不是禁用詞,那麽累加次數。這裏要使用set.count()來判斷是不是禁用詞,和map的數據結構來存儲單詞和對應的出現次數。

最後,遍歷一遍map,不斷更新出現的最大次數,順便記錄對應的元素,最終返回元素就可以了。

代碼如下:

    string mostCommonWord(string paragraph, vector<string>& banned) 
    {
        int s1=paragraph.size(),i=0,j;
        vector<string>words;
        while(i<s1)
        {
            if(isalpha(paragraph[i]))
            {
                j=i+1;
                paragraph[i]=tolower(paragraph[i]);//轉化為小寫字母
                while(isalpha(paragraph[j]))//j不斷前進
                {
                    paragraph[j]=tolower(paragraph[j]);
                    j++;
                }
                words.push_back(paragraph.substr(i,j-i));//提取子字符串,插入到vector中
                i=(j+1);//更新i的值
            }
            else
                i++;
        }
        set<string>banwords(banned.begin(),banned.end());//把禁用詞vector轉化為set,快速判斷
        map<string,int>wordnum;//定義一個map來存儲單詞和出現次數
        for(auto word:words)//記錄每個單詞的出現次數
        {
            if(banwords.count(word)==0)
                wordnum[word]++;
        }
        int max1=0;
        string res;
        for(map<string,int>::iterator iter=wordnum.begin();iter!=wordnum.end();iter++)
        {
            if(iter->second>max1)//不斷更新max1和對應的單詞
            {
                max1=iter->second;
                res=iter->first;
            }
        }
        return res; 
    }

上述代碼實測7ms,因為服務器接收到的cpp submissions有限,所以沒有打敗的百分比。

leetcode-819-Most Common Word(詞頻統計)