如何解决索引和排序问题

我有一个问题。有一个文本文件，您需要找到5个经常出现的单词。程序接收文件名。输出：按字母顺序排列的前5个字。问题在于索引不会更新并歪曲排序。请帮帮我。提前致谢。这是代码：

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

const int MAX = 100000;
string words[MAX];
int instances[MAX];
int cast = 0 ;

void insert (string input)
{
    for (int i = 0 ; i < cast; i++ )
    {
        if (input == words[i] )
        {
            instances[i]++;
            return ;
        }
    }
    if (cast < MAX)
    {
        words [cast] = input ;
        instances[cast] = 1;
        cast ++;
    }
    else
    {
        return ;
    }
 return ;
}
int FindTop (string & word)
{
    int TopCast = instances[0];
    int TopIndex = 0;
    for (int i = 1; i<cast; i++ )
    {
        if(instances[i] > TopCast )
        {
            TopCast = instances[i];
            TopIndex = i;
        }
    }
    instances[TopIndex] = 0;
    word = words[TopIndex ];
    return TopCast;
}
int main ()
{
    string word;
    string file;
    cin>>file;
    ifstream data (file);
    while(data >> word)
    {
        insert(word);
    }

    for (int i = 0; i < 5 ; i++)
    {
        cout<<FindTop(word)<<" "<<word<<endl;
    }
}

如下更新您的FindTop（）函数

        int FindTop (string & word)
        {
            int TopCast = instances[0];
            int TopIndex = 0;
            for (int i = 1; i<cast; i++ )
            {
                if(instances[i] > TopCast )
                {
                    TopCast = instances[i];
                    TopIndex = i;
                }    
                else if(TopCast == instances[i])  
                {
                     //for making sure you get the smallest word (asc order) first if multiple words   
                     // have same frequency
                    if( words[TopIndex].compare(words[i]) > 0 )
                    {
                      TopCast = instances[i];
                      TopIndex = i;
                    }
                }
            }
            instances[TopIndex] = 0;
            word = words[TopIndex ];
            return TopCast;
        }

insert函数按预期工作。但是，如果需求确实是根据出现的顺序排在前5位的单词，则按字母顺序排序，则您可能希望将这些单词保存在向量中，然后使用自制的字符串比较功能按字母顺序对其进行排序。

如果由于某种原因您没有获得前5个单词，请检查文本文件是否位于正确的目录中，以及是否将正确的文件名写入控制台。

请提供一个简短的文本文件和控制台输出。

我想提供其他解决方案。

这不是基于您的初稿，而是使用STL容器和算法的更现代的C ++解决方案。

我强烈建议完全不使用C样式数组。请使用STL容器。

然后，回到您的问题。我们将把问题基本上分为三个任务。

读取文件
数词
排序

将文件读成单词非常简单。只需使用提取器运算符，即可从文本中获取单词std:string。单词可能包含非字母字符。我们将使用std::regex_replace消除这一点。

计数也很简单。我们使用std::map的索引运算符[]。如果尚不存在std::map中的单词，则会增加计数器。如果该单词已经存在，那么该单词的计数器就会增加。

由于默认情况下std::map是按关键字vale（单词）排序的，因此我们将单词计数复制到std::vector，然后对其排序并显示结果。

请参阅

#include <iostream>
#include <sstream>
#include <iterator>
#include <string>
#include <array>
#include <algorithm>
#include <regex>
#include <map>
#include <iomanip>
#include <utility>

std::istringstream sourceFile{R"(Lorem ipsum dolor sit amet,consetetur sadipscing elitr,sed diam nonumy eirmod tempor invidunt
ut labore et dolore magna aliquyam erat,sed diam voluptua. At vero
eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren,no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum
dolor sit amet,sed diam nonumy eirmod 
tempor invidunt ut labore et dolore magna aliquyam erat,sed diam 
voluptua. At vero eos et accusam et justo duo dolores et ea rebum. 
Stet clita kasd gubergren,no sea takimata sanctus est Lorem ipsum 
dolor sit amet.)"
};


int main() {

    std::string word{};
    std::map<std::string,size_t> counter;

    // Read complete file into words
    while(sourceFile >> word) {
        // Replace special characters
        word = std::regex_replace(word,std::regex(R"([\.\,\;\:\!\?])"),"");
        // Count the occurence of each word
        counter[word]++;
    }

    // We need to sort and will copy the word-counts into a vector
    std::vector<std::pair<size_t,std::string>> countedWords;
    for( auto const& [key,val] : counter ) {
        countedWords.emplace_back(std::pair<size_t,std::string>(val,key));
    }

    // Do the sort,with a lambda,for the specific request
    std::sort(countedWords.begin(),countedWords.end(),[](std::pair<size_t,std::string> &l,std::pair<size_t,std::string> &r){
        return ((r.first == l.first) ? (l.second < r.second) : (r.first < l.first));});



    // Show result on screen
    int outputCounter{ 5 };
    for (const auto& [word,count] : countedWords) {
        std::cout << std::setw(20) << word << " --> " << count << "\n";
        if (0 >= --outputCounter) break;
    }

    // Output all words
    std::cout << "\n\nAll different word in alphabetical order:\n\n";

    // Write all words on screen.
    for (const auto& [word,count] : counter)   std::cout << word << "\n";

    return 0;
}

编辑：

我在最后添加了所有单词的字母顺序输出。

另外：请注意：std::istringstream是std::istream。文件流没有区别。因此，“ sourceFile”可以是打开的std::ifstream或std::cin或任何其他std::istream。没有区别。因此，使用std::ifstream sourceFile("c:\\temp\nameOfFile")打开文件-就是这样。

由于您的示例更像C-ish，因此我想向您展示我对这个问题的看法。

我强烈建议您查看各种standard containers（尤其是vector和unordered_map）和algorithm library。有了这些，您最终将获得更简洁，更不易出错的代码。

#include <algorithm> // sort
#include <cstdlib> // EXIT_FAILURE,EXIT_SUCCESS
#include <fstream> // ifstream
#include <iostream> // cin,cout
#include <string> // string
#include <unordered_map> // unordered_map
#include <utility> // pair
#include <vector> // vector

using namespace std;

int main()
{
    cout << "Filename: ";
    string filename;
    cin >> filename;

    ifstream file{filename};
    // Check that the file was opened successfully.
    if (!file) {
        cout << "File cannot be opened for reading: " << filename << '\n';
        return EXIT_FAILURE;
    }

    // Count the words in the file.
    // unordered_map is an associative container that stores key-value pairs
    // with unique keys. We use this to store word-occurrence pairs.
    unordered_map<string,int> words;
    for (string word; file >> word;)
        // By default if 'word' is not contained in 'words' it will be placed
        // there with the default value of 0 (default value of ints). This allow
        // us the eliminate the special case when 'word' is not in 'words' yet.
        ++words[word];

    // Sort the word-occurrence pairs in descending order by occurrence.
    // vector is a dynamic array that we use to sort the word-occurrence pairs
    // because unordered_map cannot be sorted.
    vector<std::pair<string,int>> sorted_words{words.begin(),words.end()};
    // The sort algorithm takes the begining and the end of the interval that we
    // want to sort. As a third argument we pass it a lamda function that tells
    // the algorithm how to order our word-occurrence pairs.
    sort(sorted_words.begin(),sorted_words.end(),[](const auto& a,const auto& b) {
        return a.second > b.second;
    });
    // Sort the first 5 (most frequent) words in alphabetic order.
    sort(sorted_words.begin(),sorted_words.begin() + 5,const auto& b) {
        return a.first < b.first;
    });

    for (auto i = 0; i < 5 && i < sorted_words.size(); ++i)
        cout << sorted_words[i].first << '\n';

    return EXIT_SUCCESS;
}

如何解决索引和排序问题

mdxdr 回答：如何解决索引和排序问题

大家都在问