MongoDB聚合查询与MySQL SELECT field1 FROM表

我对MongoDB完全陌生，想比较NoSQL数据模型相对于其关系数据库计数器部分的查询性能。我把它写到了MongoDB shell中

// Make 10 businesses
// Each business has 10 locations
// Each location has 10 departments
// Each department has 10 teams
// Each team has 100 employees
(new Array(10)).fill(0).forEach(_=>
    db.businesses.insert({
        "name":"Business Name","locations":(new Array(10)).fill(0).map(_=>({
            "name":"Office Location","departments":(new Array(10)).fill(0).map(_=>({
                "name":"Department","teams":(new Array(10)).fill(0).map(_=>({
                    "name":"Team Name","employees":(new Array(100)).fill(0).map(_=>({
                        "age":Math.floor(Math.random()*100)
                    }))
                }))
            }))
        }))
    })
);

然后我通过编写以下语句尝试使用MySQL的EXPLAIN SELECT age,name,(and a few other fields) FROM employees WHERE age >= 50 ORDER BY age DESC：

db.businesses.aggregate([
    { $unwind: "$locations" },{ $unwind: "$locations.departments" },{ $unwind: "$locations.departments.teams" },{ $unwind: "$locations.departments.teams.employees" },{ $project: { _id: 0,age: "$locations.departments.teams.employees.age" } },{ $match: { "age": { $gte: 50 }} },{ $sort: {"age" : -1}}
]).explain("executionStats")

结果是：

“ errmsg”：“排序超出内存限制104857600字节，但没有选择进行外部排序。中止操作。传递allowDiskUse：true 选择加入。”，

因此，我删除了sort子句并尝试获取explain。但是结果是：

TypeError：db.businesses.aggregate（...）。explain不是函数

所以我的问题是：

最重要的是，我想知道SELECT age FROM employees WHERE age >= 50 ORDER BY age DESC与MongoDB的聚合查询计数器部分相比的性能差异。差不多一样吗？一个会比另一个更快或更有效吗？
或者，如何修复MongoDB查询，以便获得性能详细信息以与MySQL查询计数器部分进行比较？

员工是单一实体；因此，您可能不想在部门，位置和团队的丰富结构中为团队成员age建模。最好有一个单独的employees集合，只需执行以下操作即可：

db.businesses.aggregate([
{$match: {"age": {$gt: 50} }},{$sort: {"age": -1} }
]);

在您的businesses收藏集中，您可以拥有：

{ teams: [ {name: "T1",employees: [ "E1","E34" ]} ] }

或者，尝试以下操作：

db.businesses.aggregate([ your pipeline],{allowDiskUse:true});

OP的设置为10个业务-> 10个位置-> 10个部门-> 10个团队-> 100个雇员。前三个展开会产生10000倍的数据爆炸，但最后一个会超出此数据100倍。我们可以使用$filter来缩小点击量：

db.businesses.aggregate([
{ $unwind: "$locations" },{ $unwind: "$locations.departments" },{ $unwind: "$locations.departments.teams" },{$project: {
        XX: {$filter: {
                    input: "$locations.departments.teams.employees",as: "z",cond: {$gte: [ "$$z.age",50] }
            }}
    }},{$unwind: "$XX"},{$sort: {"XX.age":-1}}])

通过如下修改查询，我能够在1.5秒内获得结果，而没有任何索引：

db.businesses.aggregate([
    {
        $unwind: "$locations"
    },{
        $unwind: "$locations.departments"
    },{
        $unwind: "$locations.departments.teams"
    },{
        $unwind: "$locations.departments.teams.employees"
    },{
        $match: {
            "locations.departments.teams.employees.age": {
                $gte: 50
            }
        }
    },{
        $project: {
            _id: 0,age: "$locations.departments.teams.employees.age"
        }
    },{
        $group: {
            _id: "$age"
        }
    },age: "$_id"
        }
    },{
        $sort: {
            "age": - 1
        }
    }
],{
    explain: false
})

您最好将SUMMARIES_DIRECTORY_PATH = os.path.join(current_dir,"summaries") NODECOUNTS_DIRECTORY_PATH = os.path.join(current_dir,"node_counts") summaries_path_list = os.listdir(SUMMARIES_DIRECTORY_PATH) nodecounts_path_list = os.listdir(NODECOUNTS_DIRECTORY_PATH) coop_ratios_list = [] for summary_path in summaries_path_list: coop_ratio_list = [] abs_summaries_path = os.path.join(SUMMARIES_DIRECTORY_PATH,summary_path) summaries = os.listdir(abs_summaries_path) //THIS LINE IS BREAKING// for n in range(len(summaries)): abs_summary_path_generation = os.path.join(abs_summaries_path,"summary" + str(n) + ".csv") summary = pd.read_csv(abs_summary_path_generation) coop_ratio = np.mean(summary.Cooperation_rating) coop_ratio_list.append(coop_ratio) coop_ratios_list.append(coop_ratio_list)移到第一个管道，因为聚合框架会在第一个管道之后丢失索引，所以我想您也不需要解散那些数组。

还有另一种解决总体问题的方法，尽管不是OP问题的苹果。目标是查找所有年龄大于等于50的年龄并进行排序。下面是一个“几乎”这样做并抛出loc,dept,team的示例，以防万一您也想知道如何获得它，但是您可以删除所有行以仅获取emps。现在，这是未排序的，但是可以争论的是，数据库引擎在排序方面不会比客户端做得更好，而且所有数据无论如何都必须通过网络传输。客户可以使用更复杂的编码技巧来挖掘age字段并对其进行排序。

c = db.foo.aggregate([
{$project: {XX:
  {$map: {input: "$locations",as:"z",in:
          {$map: {input: "$$z.departments",as:"z2",in:
                  {$map: {input: "$$z2.teams",as:"z3",in:
                          {loc: "$$z.name",// remove if you want
                           dept: "$$z2.name",// remove if you want
                           team: "$$z3.name",// remove if you want
                           emps: {$filter: {input: "$$z3.employees",as: "z4",cond: {$gt: [ "$$z4.age",50] }
                                    }}
                          }
                      }}
              }}
      }}
    }}
]);

ages = [];

c.forEach(function(biz) {
    biz['XX'].forEach(function(locs) {
        locs.forEach(function(depts) {
            depts.forEach(function(teams) {
                teams['emps'].forEach(function(emp) {
                    ages.push(emp['age']);
                                    });
                            });
                    });
            });
    });

print( ages.sort(function(a,b){return b-a}) );

99,98,97,96,95,94,92,84,81,78,77,76,72,71,67,66,65,64,63,62,61,59,57,56,55,54,52,51

在运行MongoDB 4.0的MacBook Pro上，我们看到的集合如下：

Collection            Count   AvgSize          Unz  Xz  +Idx     TotIdx  Idx/doc
--------------------  ------- -------- -G--M------  --- ---- ---M------  -------
                 foo       10   2238682     22386820  4.0    0      16384    0

鉴于随机年龄介于0到100之间，每个位置/部门/团队的年龄都大于等于50并不奇怪，并且返回的字节总数约为一半。但是请注意，设置agg的总时间（不返回所有字节）约为700毫秒。

697 millis to agg; 0.697
found 10
tot bytes 11536558

MongoDB聚合查询与MySQL SELECT field1 FROM表

henryjiangyun 回答：MongoDB聚合查询与MySQL SELECT field1 FROM表

大家都在问