员工是单一实体;因此,您可能不想在部门,位置和团队的丰富结构中为团队成员age
建模。最好有一个单独的employees
集合,只需执行以下操作即可:
db.businesses.aggregate([
{$match: {"age": {$gt: 50} }},{$sort: {"age": -1} }
]);
在您的businesses
收藏集中,您可以拥有:
{ teams: [ {name: "T1",employees: [ "E1","E34" ]} ] }
或者,尝试以下操作:
db.businesses.aggregate([ your pipeline],{allowDiskUse:true});
OP的设置为10个业务-> 10个位置-> 10个部门-> 10个团队-> 100个雇员。前三个展开会产生10000倍的数据爆炸,但最后一个会超出此数据100倍。我们可以使用$filter
来缩小点击量:
db.businesses.aggregate([
{ $unwind: "$locations" },{ $unwind: "$locations.departments" },{ $unwind: "$locations.departments.teams" },{$project: {
XX: {$filter: {
input: "$locations.departments.teams.employees",as: "z",cond: {$gte: [ "$$z.age",50] }
}}
}},{$unwind: "$XX"},{$sort: {"XX.age":-1}}])
,
通过如下修改查询,我能够在1.5秒内获得结果,而没有任何索引:
db.businesses.aggregate([
{
$unwind: "$locations"
},{
$unwind: "$locations.departments"
},{
$unwind: "$locations.departments.teams"
},{
$unwind: "$locations.departments.teams.employees"
},{
$match: {
"locations.departments.teams.employees.age": {
$gte: 50
}
}
},{
$project: {
_id: 0,age: "$locations.departments.teams.employees.age"
}
},{
$group: {
_id: "$age"
}
},age: "$_id"
}
},{
$sort: {
"age": - 1
}
}
],{
explain: false
})
,
您最好将SUMMARIES_DIRECTORY_PATH = os.path.join(current_dir,"summaries")
NODECOUNTS_DIRECTORY_PATH = os.path.join(current_dir,"node_counts")
summaries_path_list = os.listdir(SUMMARIES_DIRECTORY_PATH)
nodecounts_path_list = os.listdir(NODECOUNTS_DIRECTORY_PATH)
coop_ratios_list = []
for summary_path in summaries_path_list:
coop_ratio_list = []
abs_summaries_path = os.path.join(SUMMARIES_DIRECTORY_PATH,summary_path)
summaries = os.listdir(abs_summaries_path) //THIS LINE IS BREAKING//
for n in range(len(summaries)):
abs_summary_path_generation = os.path.join(abs_summaries_path,"summary" + str(n) + ".csv")
summary = pd.read_csv(abs_summary_path_generation)
coop_ratio = np.mean(summary.Cooperation_rating)
coop_ratio_list.append(coop_ratio)
coop_ratios_list.append(coop_ratio_list)
移到第一个管道,因为聚合框架会在第一个管道之后丢失索引,所以我想您也不需要解散那些数组。
,
还有另一种解决总体问题的方法,尽管不是OP问题的苹果。目标是查找所有年龄大于等于50的年龄并进行排序。下面是一个“几乎”这样做并抛出loc,dept,team
的示例,以防万一您也想知道如何获得它,但是您可以删除所有行以仅获取emps
。现在,这是未排序的,但是可以争论的是,数据库引擎在排序方面不会比客户端做得更好,而且所有数据无论如何都必须通过网络传输。客户可以使用更复杂的编码技巧来挖掘age
字段并对其进行排序。
c = db.foo.aggregate([
{$project: {XX:
{$map: {input: "$locations",as:"z",in:
{$map: {input: "$$z.departments",as:"z2",in:
{$map: {input: "$$z2.teams",as:"z3",in:
{loc: "$$z.name",// remove if you want
dept: "$$z2.name",// remove if you want
team: "$$z3.name",// remove if you want
emps: {$filter: {input: "$$z3.employees",as: "z4",cond: {$gt: [ "$$z4.age",50] }
}}
}
}}
}}
}}
}}
]);
ages = [];
c.forEach(function(biz) {
biz['XX'].forEach(function(locs) {
locs.forEach(function(depts) {
depts.forEach(function(teams) {
teams['emps'].forEach(function(emp) {
ages.push(emp['age']);
});
});
});
});
});
print( ages.sort(function(a,b){return b-a}) );
99,98,97,96,95,94,92,84,81,78,77,76,72,71,67,66,65,64,63,62,61,59,57,56,55,54,52,51
在运行MongoDB 4.0的MacBook Pro上,我们看到的集合如下:
Collection Count AvgSize Unz Xz +Idx TotIdx Idx/doc
-------------------- ------- -------- -G--M------ --- ---- ---M------ -------
foo 10 2238682 22386820 4.0 0 16384 0
鉴于随机年龄介于0到100之间,每个位置/部门/团队的年龄都大于等于50并不奇怪,并且返回的字节总数约为一半。但是请注意,设置agg的总时间(不返回所有字节)约为700毫秒。
697 millis to agg; 0.697
found 10
tot bytes 11536558
本文链接:https://www.f2er.com/3013613.html