MongoDB之路-聚合操作

mongo的聚合操作和mysql的查询类比

SQL 操作/函数	mongodb聚合操作
where	$match
group by	$group
having	$match
select	$project
order by	$sort
limit	$limit
sum()	$sum
count()	$sum
join	$lookup

下面一些例子和sql做对比

下面是使用数据库的一个基本结构

数据链接（提取码:gqh2）

json复制代码{ 
  _id: ObjectId("5dbe7a545368f69de2b4d36e"),
  street: '493 Hilll Curve',
  city: 'Champlinberg',
  state: 'Texas',
  country: 'Malaysia',
  zip: '24344-1715',
  phone: '425.956.7743 x4621',
  name: 'Destinee Schneider',
  userId: 3573,
  orderDate: 2019-03-26T03:20:08.805Z,
  status: 'created',
  shippingFee: Decimal128("8.00"),
  orderLines: [ 
     { 
       product: 'Refined Fresh Tuna',
       sku: '2057',
       qty: 25,
       price: Decimal128("56.00"),
       cost: Decimal128("46.48") },
     {
       product: 'Refined Concrete Ball',
       sku: '1732',
       qty: 61,
       price: Decimal128("47.00"),
       cost: Decimal128("47") 
     },
  ],
  total: Decimal128("407") 
}

先来一些操作案例

sql复制代码select sum(total) from orders

db.orders.aggregate({$group:{_id:null,total:{$sum:"$total"}}})
结果：{ _id: null, 'total': Decimal128("44019609") }

sql复制代码select count(1) from orders

db.orders.aggregate({$group:{_id:null,total:{$sum:1}}})
结果：{ _id: null, total: 100000 }

sql复制代码select count(1) from orders group by status

db.orders.aggregate({$group:{_id:"$status",total:{$sum:1}}})
结果：
{ _id: 'created', total: 20087 }
{ _id: 'shipping', total: 20017 }
{ _id: 'cancelled', total: 19978 }
{ _id: 'completed', total: 20015 }
{ _id: 'fulfilled', total: 19903 }

sql复制代码select count(1) from orders group by status having count(1) > 20000

db.orders.aggregate([
    {$group:{_id:{status:'$status'},total:{$sum:1}}},
    {$match:{total:{$gte:20000}}}
 ])
结果：
{ _id: { status: 'created' }, total: 20087 }
{ _id: { status: 'shipping' }, total: 20017 }
{ _id: { status: 'completed' }, total: 20015 }

sql复制代码select count(1) total
from orders 
group by status,year(orderDate),month(orderDate)
order by year(orderDate),month(orderDate)

db.orders.aggregate([
  {
    $group:{
      _id:{
         status:'$status',
         orderDate:{
           year:{$year:"$orderDate"},
           month:{$month:"$orderDate"}
         }
       },
       total:{$sum:1}
     }
  },{
    $sort:{"_id.orderDate.year":1,"_id.orderDate.month":1}
  }
])
结果：
{ _id: { status: 'cancelled', orderDate: { year: 2019, month: 1 } }, total: 2066 }
{ _id: { status: 'shipping', orderDate: { year: 2019, month: 1 } }, total: 2058 }
{ _id: { status: 'completed', orderDate: { year: 2019, month: 1 } }, total: 2068 }
{ _id: { status: 'created', orderDate: { year: 2019, month: 1 } }, total: 2047 }
{ _id: { status: 'fulfilled', orderDate: { year: 2019, month: 1 } }, total: 2076 }
{ _id: { status: 'cancelled', orderDate: { year: 2019, month: 2 } }, total: 1816 }
{ _id: { status: 'created', orderDate: { year: 2019, month: 2 } }, total: 1817 }
{ _id: { status: 'shipping', orderDate: { year: 2019, month: 2 } }, total: 1844 }
{ _id: { status: 'completed', orderDate: { year: 2019, month: 2 } }, total: 1813 }
{ _id: { status: 'fulfilled', orderDate: { year: 2019, month: 2 } }, total: 1913 }
......

sql复制代码select *
from(
    select month(orderDate) month,name,status
    from orders
) order
where month = 2

db.orders.aggregate([{$project:{month:{$month:"$orderDate"},name:1,status:1}},{$match:{month:2}}]) 
结果：
{ _id: ObjectId("5dbe7a542411dc9de6429190"),name: 'Kris Hansen',status: 'cancelled',month: 2 }
{ _id: ObjectId("5dbe7a542411dc9de6429191"),name: 'Constantin Wuckert',status: 'completed',month: 2 }
{ _id: ObjectId("5dbe7a545368f69de2b4d375"),name: 'Reed Jerde',status: 'fulfilled',month: 2 }
{ _id: ObjectId("5dbe7a54cd023b9de4efc1d2"),name: 'Lyric Hodkiewicz',status: 'cancelled',month: 2 }
.....

sql复制代码select count(*) from orders where month(orderDate) >= 3 group by month(orderDate)

db.orders.aggregate([
    {$match:{$expr:{$gte:[{$month:"$orderDate"},3]}}},
    {$group:{_id:{month:{$month:"$orderDate"}},count:{$sum:1}}}
 ]);
#结果
{ _id: { month: 6 }, count: 9915 }
{ _id: { month: 4 }, count: 10083 }
{ _id: { month: 10 }, count: 9928 }
{ _id: { month: 5 }, count: 10142 }
{ _id: { month: 8 }, count: 10194 }
{ _id: { month: 9 }, count: 9779 }
{ _id: { month: 7 }, count: 10240 }
{ _id: { month: 3 }, count: 10201 }

MongoDB聚合(Aggregate)

MongoDB 中聚合(aggregate)主要用于处理数据(诸如统计平均值，求和等)，并返回计算后的数据结果

一个aggregate由多个阶段(Stage)组成。上一阶段产生的结果会作为下一阶段的输入，所以也会被形象的称为流水线(Pipeline)。

表达式：处理输入文档并输出。表达式是无状态的，只能用于计算当前聚合管道的文档，不能处理其它的文档。

这里我们介绍一下聚合框架中常用的几个操作：

$project：修改输入文档的结构。可以用来重命名、增加或删除域，也可以用于创建计算结果以及嵌套文档。
$match：用于过滤数据，只输出符合条件的文档。$match使用MongoDB的标准查询操作。
$limit：用来限制MongoDB聚合管道返回的文档数。
$skip：在聚合管道中跳过指定数量的文档，并返回余下的文档。
$unwind：将文档中的某一个数组类型字段拆分成多条，每条包含数组中的一个值。
$group：将集合中的文档分组，可用于统计结果。
$sort：将输入文档排序后输出。
$geoNear：输出接近某一地理位置的有序文档。

下面是一个aggregate的基本处理流程

db.collection.aggregate() 可以用多个构件创建一个管道，对于一连串的文档进行处理。这些构件包括：筛选操作的match、映射操作的project、分组操作的group、排序操作的sort、限制操作的limit、和跳过操作的skip。
db.collection.aggregate()使用了MongoDB内置的原生操作，聚合效率非常高,支持类似于SQL Group By操作的功能。
每个阶段管道限制为100MB的内存。如果一个节点管道超过这个极限,MongoDB将产生一个错误。为了能够在处理大型数据集,可以设置allowDiskUse为true来在聚合管道节点把数据写入临时文件。这样就可以解决100MB的内存的限制。
db.collection.aggregate()可以作用在分片集合，但结果不能输在分片集合，MapReduce可以作用在分片集合，结果也可以输在分片集合。
db.collection.aggregate()方法可以返回一个指针（cursor），数据放在内存中，直接操作。跟Mongo shell 一样指针操作。
db.collection.aggregate()输出的结果只能保存在一个文档中，BSON Document大小限制为16M。可以通过返回指针解决，版本2.6中后面：db.collect.aggregate()方法返回一个指针，可以返回任何结果集的大小。

$count
返回文档统计数

先看一些非聚合操作中的count使用方法

sql复制代码#对应查询出来的是orders这个集合中的所有数据总和
db.orders.count();
#结果:{"result": 100000}
#对应查出来自Malaysia这个国家的订单总和
db.orders.find({country:"Malaysia"}).count()
#结果:{"result": 392}

使用聚合操作中$count来汇总行数

sql复制代码#使用聚合查出来自Malaysia这个国家的订单总和,并且返回给counts字段
db.orders.aggregate([
    {$match:{country:"Malaysia"}},
    {$count:"counts"}
])
#结果:{"counts": 392}
#下面是两种不同的写法只是在 $match的时候有所区别(可以先体验以下)
db.orders.aggregate([
    {$match:{country:{$eq:"Malaysia"}}},
    {$count:"counts"}
])
db.orders.aggregate([
    {$match:{$expr:{$eq:["$country","Malaysia"]}}},
    {$count:"counts"}
])
#结果:{"counts": 392}

除此以外可以灵活使用group+$sum来实现$count

sql复制代码#对应查询出来的是orders这个集合中的所有数据总和,并且返回给counts字段
db.orders.aggregate({$group:{_id:null,counts:{$sum:1}}})
#结果:{"_id": null,"counts": 100000}
#使用聚合查出来自Malaysia这个国家的订单总和,并且返回给counts字段
db.orders.aggregate([
    {$match:{country:{$eq:"Malaysia"}}},
    {$group:{_id:null,counts:{$sum:1}}}
])
#结果:{"_id": null,"counts": 392}

$group
按照指定表达式对文档进行分组

$group使用的基本语法:

1	json复制代码{ $group: { _id: <expression>, <field1>: { <accumulator1> : <expression1> }, ... } }

_id+表达式用来做分组条件，也就是_id后面的内容与sql中group by后面的表达式的用途相同
_id后面的字段+accumulator操作符与sql中做完group by后在select后面的的聚合函数用途相同，例如:sum()、avg()、max()、min()

例如：

sql复制代码db.orders.aggregate({$group:{_id:"$country",total:{$sum:"$total"}}})
#结果
{ _id: 'Guam', total: Decimal128("182335") }
{ _id: 'El Salvador', total: Decimal128("159475") }
{ _id: 'Saint Martin', total: Decimal128("163267") }
{ _id: 'Botswana', total: Decimal128("189330") }
{ _id: 'San Marino', total: Decimal128("174200") }
{ _id: 'Czech Republic', total: Decimal128("178602") }
{ _id: 'Estonia', total: Decimal128("172816") }
.......
#上面的mql相当于sql中的
select sum(total) from orders group by country

$group阶段的内存限制为100M。默认情况下，如果stage超过此限制，$group将产生错误。但是，要允许处理大型数据集，请将allowDiskUse选项设置为true以启用$group操作以写入临时文件。

名称	描述	类比sql
$avg	计算均值	avg
$first	返回每组第一个文档，如果有排序，按照排序，如果没有按照默认的存储的顺序的第一个文档。	limit 0,1
$last	返回每组最后一个文档，如果有排序，按照排序，如果没有按照默认的存储的顺序的最后个文档。	-
$max	根据分组，获取集合中所有文档对应值得最大值。	max
$min	根据分组，获取集合中所有文档对应值得最小值。	min
$push	将指定的表达式的值添加到一个数组中。	-
$addToSet	将表达式的值添加到一个集合中（无重复值，无序）。	-
$sum	计算总和	sum
$stdDevPop	返回输入值的总体标准偏差（population standard deviation）	-
$stdDevSamp	返回输入值的样本标准偏差（the sample standard deviation）	-

下面我们按照以上文档依次用一下每一个表达式

$avg计算平均值

json复制代码--计算每个国家的每个订单的平均消费
db.orders.aggregate({$group:{
        _id:"$country",
        avgMoney:{$avg:"$total"}
    }})
--结果    
{ _id: 'Saudi Arabia',avgMoney: Decimal128("433.4898419864559819413092550790068") }
{ _id: 'New Caledonia',avgMoney: Decimal128("441.9833729216152019002375296912114") }
{ _id: 'Congo',avgMoney: Decimal128("451.8834951456310679611650485436893") }
{ _id: 'Turkey',avgMoney: Decimal128("425.7422434367541766109785202863962") }
{ _id: 'Cuba',avgMoney: Decimal128("437.2074074074074074074074074074074") }
{ _id: 'Uruguay',avgMoney: Decimal128("434.1564792176039119804400977995110") }
......

$first返回第一个文档

json复制代码--根据国家分组，每组第一笔订单的订单商品列表
db.orders.aggregate({$group:{
    _id:"$country",
    firstOrderLines:{$first:"$orderLines"}
}})
--结果
{ _id: 'Malta',firstOrderLines: [ 
  { product: 'Sleek Frozen Table',sku: '368',qty: 30,price: Decimal128("24.00"),cost:Decimal128("19.44") },
  { product: 'Intelligent Metal',sku: '179',qty: 62,price: Decimal128("91.00"),cost: Decimal128("90.09") },
  { product: 'Intelligent Granite',sku: '9',qty: 31,price: Decimal128("68.00"),cost: Decimal128("61.88") },
  { product: 'Licensed Cotton',sku: '6846',qty: 9,price: Decimal128("16.00"),cost: Decimal128("15.68") } 
] }
{ _id: 'Papua New Guinea',firstOrderLines: [ 
  { product: 'Fantastic Wooden',sku: '4333',qty: 32,price: Decimal128("58.00"),cost: Decimal128("57.42") }
  ...
] }
......

--根据国家分组，每组第一笔订单的订单商品列表里面的第一条商品信息
db.orders.aggregate({$group:{
    _id:"$country",
    firstOrder:{$first:{$first:"$orderLines"}}
}})
---结果
{ _id: 'Malta',firstOrder:  
 { product: 'Sleek Frozen Table',sku: '368',qty: 30,price: Decimal128("24.00"),cost:Decimal128("19.44") }
}
{ _id: 'Papua New Guinea',firstOrder:
  { product: 'Fantastic Wooden',sku: '4333',qty: 32,price: Decimal128("58.00"),cost: Decimal128("57.42") }
}
......

$last返回最后一个文档

json复制代码--根据每个国家分组，每笔最后一个订单的orderDate
db.orders.aggregate([{$group:{
    _id:"$country",
    lastOrderDate:{$last:"$orderDate"}
}}])
--结果
{ _id: 'Micronesia', lastOrderDate: 2019-01-15T07:23:18.002Z }
{ _id: 'Malaysia', lastOrderDate: 2019-05-15T20:16:56.644Z }
{ _id: 'San Marino', lastOrderDate: 2019-09-29T06:10:07.292Z }

$max和$min：最大值和最小值

json复制代码--根据年月分组，查出每组第一笔订单时间和最后一组订单时间
db.orders.aggregate({$group:{
    _id:{year:{$year:"$orderDate"},month:{$month:"$orderDate"}},
    maxDate:{$max:"$orderDate"},
    minDate:{$min:"$orderDate"}
}})
--结果
{ _id: { year: 2019, month: 1 }, maxDate: 2019-01-31T23:53:57.308Z, minDate: 2019-01-01T00:03:59.661Z }
{ _id: { year: 2019, month: 4 }, maxDate: 2019-04-30T23:57:03.352Z, minDate: 2019-04-01T00:02:12.224Z }
{ _id: { year: 2019, month: 3 }, maxDate: 2019-03-31T23:55:10.312Z, minDate: 2019-03-01T00:13:53.761Z }
{ _id: { year: 2019, month: 7 }, maxDate: 2019-07-31T23:55:51.718Z, minDate: 2019-07-01T00:00:07.540Z }

$push将指定值添加到一个数组当中可以push到一个已经存在的数组当中，如果不存在会创建这样一个数组

json复制代码--根据城市、年、月分组，将每组的下单时间push到一个新的 orderDates 数组当中
db.orders.aggregate({$group:{
    _id:{city:"$city",year:{$year:"$orderDate"},month:{$month:"$orderDate"}},
    orderDates:{$push:"$orderDate"},
}})
--结果
{ _id: { city: 'Kennedifurt', year: 2019, month: 9 }, orderDates: [ 2019-09-30T10:25:19.763Z ] }
{ _id: { city: 'South Jewelstad', year: 2019, month: 1 }, orderDates: [ 2019-01-06T19:59:03.380Z ] }
{ _id: { city: 'Germanmouth', year: 2019, month: 9 }, orderDates: [ 2019-09-25T07:45:54.260Z ] }
{ _id: { city: 'Fayebury', year: 2019, month: 8 }, orderDates: [ 2019-08-12T11:08:37.815Z ] }
{ _id: { city: 'New Lailaport', year: 2019, month: 1 }, orderDates: [ 2019-01-19T12:28:56.978Z ] }
{ _id: { city: 'Port Bennyside', year: 2019, month: 2 }, orderDates: [ 2019-02-25T01:18:21.657Z ] }
{ _id: { city: 'Abernathymouth', year: 2019, month: 6 }, orderDates: 
   [ 2019-06-03T18:03:21.149Z,
     2019-06-13T23:35:32.994Z,
     2019-06-18T11:32:22.229Z ] 
}

$addToSet将指定值添加到一个集合当中集合是无序的并且会去重

json复制代码--根据月份分组，将每个月都下单过的国家都添加到 countrySet 中去
db.orders.aggregate({
    $group:{
        _id:{year:{$year:"$orderDate"},month:{$month:"$orderDate"}},
        countrySet:{$addToSet:"$country"}
    }
})
--结果  
{
    "_id": {
      "year": 2019,
      "month": 1
    },
    "countrySet": ["French Guiana", "Germany", "Poland", "Comoros", "Portugal", "Fiji", "France", "Benin", "Greece", "Belarus", "Vietnam", "Ireland", "Vanuatu", "Netherlands Antilles", "Iceland", "Palestinian Territory", "Malawi", "Brazil", "Libyan Arab Jamahiriya", "Kuwait", "Liechtenstein", "Suriname", "Uganda", "New Caledonia", "Bolivia", "Nicaragua", "Burundi", "Uzbekistan", "Jamaica", "South Georgia and the South Sandwich Islands", "Tajikistan", "Mexico", "Singapore", "Sri Lanka", "Antarctica (the territory South of 60 deg S)", "Myanmar", "Tonga", "Slovenia", "Latvia", "Ukraine", "Oman", "Saint Helena", "Bosnia and Herzegovina", "Hungary", "Aruba", "Jordan", "Solomon Islands", "Mozambique", "Svalbard & Jan Mayen Islands", "Taiwan", "Cyprus", "Thailand", "Equatorial Guinea", "Belize", "Niger", "Israel", "Hong Kong", "Senegal", "Costa Rica", "Sierra Leone", "Kiribati", "Lesotho", "Nepal", "Serbia", "Barbados", "Spain", "Czech Republic", "Saint Martin", "Saint Pierre and Miquelon", "Togo", "Somalia", "Northern Mariana Islands", "Maldives", "British Indian Ocean Territory (Chagos Archipelago)", "Montenegro", "Cote d'Ivoire", "United Arab Emirates", "Guernsey", "Bulgaria", "Netherlands", "Greenland", "Niue", "Colombia", "Egypt", "Madagascar", "Brunei Darussalam", "Iraq", "Mauritius", "French Polynesia", "Jersey", "Canada", "Grenada", "Honduras", "New Zealand", "Cocos (Keeling) Islands", "Mayotte", "Virgin Islands, British", "Finland", "Macedonia", "Cook Islands", "Micronesia", "Christmas Island", "Turks and Caicos Islands", "Falkland Islands (Malvinas)", "El Salvador", "Estonia", "Eritrea", "Afghanistan", "San Marino", "Malaysia", "Cambodia", "Anguilla", "Philippines", "Zambia", "Republic of Korea", "Mauritania", "Yemen", "South Africa", "Gambia", "Namibia", "Peru", "Samoa", "Qatar", "Guinea", "Monaco", "Mongolia", "Cayman Islands", "Bouvet Island (Bouvetoya)", "Romania", "Sweden", "Guam", "Guyana", "Japan", "Bangladesh", "Djibouti", "Reunion", "Central African Republic", "Martinique", "Sudan", "Norway", "Guadeloupe", "Malta", "Papua New Guinea", "Macao", "Tunisia", "Iran", "Ghana", "Trinidad and Tobago", "Syrian Arab Republic", "French Southern Territories", "Russian Federation", "Botswana", "Pakistan", "Luxembourg", "Ethiopia", "Austria", "Rwanda", "Holy See (Vatican City State)", "American Samoa", "Tanzania", "Morocco", "Lao People's Democratic Republic", "Faroe Islands", "Bahrain", "China", "Indonesia", "Ecuador", "Tuvalu", "Panama", "Algeria", "Gibraltar", "Nigeria", "Kyrgyz Republic", "Chile", "Cape Verde", "Palau", "Armenia", "Dominican Republic", "Bhutan", "Liberia", "India", "Mali", "Switzerland", "Isle of Man", "Argentina", "Virgin Islands, U.S.", "Swaziland", "Timor-Leste", "Azerbaijan", "Bahamas", "Guatemala", "Saint Lucia", "Sao Tome and Principe", "United States Minor Outlying Islands", "Australia", "Italy", "Paraguay", "Tokelau", "Gabon", "Wallis and Futuna", "Cameroon", "Norfolk Island", "Guinea-Bissau", "Chad", "Zimbabwe", "Nauru", "Pitcairn Islands", "Georgia", "Kenya", "Bermuda", "Kazakhstan", "Democratic People's Republic of Korea", "Puerto Rico", "Croatia", "Antigua and Barbuda", "Seychelles", "Marshall Islands", "Burkina Faso", "Denmark", "United Kingdom", "Dominica", "Albania", "Angola", "Slovakia (Slovak Republic)", "Western Sahara", "Belgium", "Saudi Arabia", "Turkey", "Congo", "Cuba", "Uruguay", "Montserrat", "United States of America", "Lebanon", "Saint Vincent and the Grenadines", "Saint Kitts and Nevis", "Saint Barthelemy", "Haiti", "Moldova", "Heard Island and McDonald Islands", "Lithuania", "Turkmenistan", "Venezuela", "Andorra"]
  },
  {
    "_id": {
      "year": 2019,
      "month": 9
    },
    "countrySet": ["Germany", "Poland", "French Guiana", "Fiji", "France", "Comoros", "Portugal", "Benin", "Greece", "Belarus", "Ireland", "Vietnam", "Brazil", "Malawi", "Vanuatu", "Netherlands Antilles", "Palestinian Territory", "Iceland", "Kuwait", "Libyan Arab Jamahiriya", "Liechtenstein", "New Caledonia", "Suriname", "Uganda", "Bolivia", "Uzbekistan", "Burundi", "Nicaragua", "Tajikistan", "Jamaica", "South Georgia and the South Sandwich Islands", "Sri Lanka", "Mexico", "Singapore", "Antarctica (the territory South of 60 deg S)", "Tonga", "Myanmar", "Slovenia", "Latvia", "Oman", "Saint Helena", "Ukraine", "Bosnia and Herzegovina", "Aruba", "Jordan", "Hungary", "Mozambique", "Solomon Islands", "Svalbard & Jan Mayen Islands", "Thailand", "Taiwan", "Cyprus", "Equatorial Guinea", "Belize", "Niger", "Senegal", "Hong Kong", "Israel", "Kiribati", "Costa Rica", "Sierra Leone", "Lesotho", "Saint Martin", "Spain", "Barbados", "Nepal", "Togo", "Maldives", "Czech Republic", "Somalia", "Saint Pierre and Miquelon", "Serbia", "Northern Mariana Islands", "Montenegro", "British Indian Ocean Territory (Chagos Archipelago)", "Cote d'Ivoire", "United Arab Emirates", "Guernsey", "Niue", "Bulgaria", "Netherlands", "Egypt", "Colombia", "Greenland", "Brunei Darussalam", "Madagascar", "Mauritius", "Iraq", "Canada", "French Polynesia", "Jersey", "Grenada", "Cocos (Keeling) Islands", "New Zealand", "Honduras", "Virgin Islands, British", "Mayotte", "Cook Islands", "Finland", "Macedonia", "Micronesia", "Turks and Caicos Islands", "Christmas Island", "Estonia", "Falkland Islands (Malvinas)", "El Salvador", "Eritrea", "Malaysia", "San Marino", "Afghanistan", "Anguilla", "Cambodia", "Zambia", "Republic of Korea", "Mauritania", "Philippines", "South Africa", "Gambia", "Yemen", "Qatar", "Peru", "Namibia", "Guinea", "Samoa", "Cayman Islands", "Monaco", "Mongolia", "Bouvet Island (Bouvetoya)", "Romania", "Sweden", "Guam", "Guyana", "Djibouti", "Japan", "Bangladesh", "Reunion", "Central African Republic", "Sudan", "Norway", "Martinique", "Guadeloupe", "Papua New Guinea", "Malta", "Tunisia", "Macao", "Iran", "Ghana", "Syrian Arab Republic", "Trinidad and Tobago", "French Southern Territories", "Botswana", "Luxembourg", "Russian Federation", "Pakistan", "Ethiopia", "Holy See (Vatican City State)", "Panama", "Austria", "Rwanda", "American Samoa", "Faroe Islands", "Tanzania", "Morocco", "Lao People's Democratic Republic", "Ecuador", "China", "Indonesia", "Bahrain", "Algeria", "Tuvalu", "Gibraltar", "Nigeria", "Kyrgyz Republic", "Chile", "Palau", "Cape Verde", "Bhutan", "Dominican Republic", "Armenia", "Mali", "Isle of Man", "Liberia", "India", "Switzerland", "Argentina", "Virgin Islands, U.S.", "Timor-Leste", "Swaziland", "Azerbaijan", "United States Minor Outlying Islands", "Saint Lucia", "Bahamas", "Guatemala", "Australia", "Sao Tome and Principe", "Tokelau", "Paraguay", "Italy", "Wallis and Futuna", "Gabon", "Cameroon", "Guinea-Bissau", "Chad", "Norfolk Island", "Zimbabwe", "Nauru", "Georgia", "Kenya", "Pitcairn Islands", "Bermuda", "Kazakhstan", "Democratic People's Republic of Korea", "Croatia", "Puerto Rico", "Antigua and Barbuda", "Seychelles", "Marshall Islands", "Burkina Faso", "Dominica", "Denmark", "Albania", "United Kingdom", "Angola", "Slovakia (Slovak Republic)", "Western Sahara", "Belgium", "Turkey", "Congo", "Saudi Arabia", "Uruguay", "Cuba", "United States of America", "Montserrat", "Lebanon", "Saint Kitts and Nevis", "Saint Vincent and the Grenadines", "Saint Barthelemy", "Haiti", "Moldova", "Lithuania", "Heard Island and McDonald Islands", "Turkmenistan", "Venezuela", "Andorra"]
  }
.......

$sum计算总和

json复制代码--根据月份分组，获取每组的收入总和 sumTotal
db.orders.aggregate({
    $group:{
        _id:{year:{$year:"$orderDate"},month:{$month:"$orderDate"}},
        sumTotal:{$sum:"$total"}
    }
})
--结果
{ _id: { year: 2019, month: 2 }, sumTotal: Decimal128("4072808") }
{ _id: { year: 2019, month: 10 }, sumTotal: Decimal128("4356471") }
{ _id: { year: 2019, month: 5 }, sumTotal: Decimal128("4460433") }
......

$stdDevPop返回输入值的总体标准偏差

json复制代码--根据月份分组,计算总体准偏差计算
db.orders.aggregate({
    $group:{
        _id:{year:{$year:"$orderDate"},month:{$month:"$orderDate"}},
        popTotal:{$stdDevPop:"$total"}
    }
})
--结果
{ _id: { year: 2019, month: 2 }, popTotal: 189.3064965965138 }
{ _id: { year: 2019, month: 10 }, popTotal: 187.19676293125292 }
{ _id: { year: 2019, month: 5 }, popTotal: 189.54277980510432 }
{ _id: { year: 2019, month: 8 }, popTotal: 189.52305549485735 }
{ _id: { year: 2019, month: 6 }, popTotal: 189.99641948294692 }
{ _id: { year: 2019, month: 1 }, popTotal: 188.89723701416594 }
{ _id: { year: 2019, month: 4 }, popTotal: 189.33635941008336 }
{ _id: { year: 2019, month: 3 }, popTotal: 190.39465578257668 }
{ _id: { year: 2019, month: 7 }, popTotal: 189.01641050584374 }
{ _id: { year: 2019, month: 9 }, popTotal: 188.10379143822877 }

$stdDevSamp返回输入值的样本标准偏差

json复制代码--根据月份分组,计算样本准偏差计算
db.orders.aggregate({
    $group:{
        _id:{year:{$year:"$orderDate"},month:{$month:"$orderDate"}},
        sampTotal:{$stdDevSamp:"$total"}
    }
})
--结果
{ _id: { year: 2019, month: 2 }, sampTotal: 189.31678247750685 }
{ _id: { year: 2019, month: 9 }, sampTotal: 188.1134099175866 }
{ _id: { year: 2019, month: 7 }, sampTotal: 189.02564049879336 }
{ _id: { year: 2019, month: 3 }, sampTotal: 190.40398862519802 }
{ _id: { year: 2019, month: 5 }, sampTotal: 189.55212494401323 }
{ _id: { year: 2019, month: 4 }, sampTotal: 189.34574899869335 }
{ _id: { year: 2019, month: 1 }, sampTotal: 188.90639411415503 }
{ _id: { year: 2019, month: 8 }, sampTotal: 189.53235199281477 }
{ _id: { year: 2019, month: 6 }, sampTotal: 190.00600146946147 }
{ _id: { year: 2019, month: 10 }, sampTotal: 187.20619136123352 }

$match
接受一个指定查询条件的文档。查询语法与读操作查询语法相同。

基本的语法{ $match: { <query> } }

在实际应用中尽可能将$match放在管道的前面位置。这样有两个好处：

可以快速将不需要的文档过滤掉，以减少管道的工作量
如果再投射和分组之前执行$match，查询可以使用索引。

json复制代码--类似于in查询
db.orders.aggregate({
    $match:{
        country:{$in:["Romania", "Sweden", "Guam", "Guyana"]}
    }
})
--结果：查出这几个国家的订单

json复制代码--范围查询
db.orders.aggregate({
    $match:{
        orderDate:{$gte:ISODate("2019-02-01"),$lt:ISODate("2019-02-04")}
    }
})
--结果:查出 2019-02-01 到 2019-02-03这三天的所有订单

$expr使用聚合表达式

json复制代码--查询3月份以及往后的数据 - 根据月份分组 - sum订单数
db.orders.aggregate([
    {$match:{$expr:{$gte:[{$month:"$orderDate"},3]}}},
    {$group:{_id:{month:{$month:"$orderDate"}},count:{$sum:1}}}
 ]);
--结果
{ _id: { month: 7 }, count: 10240 }
{ _id: { month: 5 }, count: 10142 }
{ _id: { month: 6 }, count: 9915 }
{ _id: { month: 4 }, count: 10083 }
{ _id: { month: 10 }, count: 9928 }
{ _id: { month: 9 }, count: 9779 }
{ _id: { month: 3 }, count: 10201 }
{ _id: { month: 8 }, count: 10194 }

$mod使用取模运算符

json复制代码--查询total属性后面是00结尾的订单
db.orders.aggregate([
    {$match:{total:{$mod:[100,0]}}}
 ]);
--结果
{ _id: ObjectId("5dbe7a575368f69de2b4d4db"),
  street: '5929 Elroy Points',
  city: 'Retaberg',
  state: 'Utah',
  country: 'Cote d\'Ivoire',
  zip: '73722-0034',
  phone: '113.509.1520',
  name: 'Sanford Runte',
  userId: 7843,
  orderDate: 2019-02-21T20:26:32.458Z,
  status: 'completed',
  shippingFee: Decimal128("7.00"),
  orderLines: 
   [ { product: 'Fantastic Steel Shoes',
       sku: '1374',
       qty: 82,
       price: Decimal128("15.00"),
       cost: Decimal128("13.35") },
     { product: 'Sleek Frozen Salad',
       sku: '2698',
       qty: 79,
       price: Decimal128("41.00"),
       cost: Decimal128("33.21") },
     { product: 'Intelligent Granite Mouse',
       sku: '17',
       qty: 55,
       price: Decimal128("54.00"),
       cost: Decimal128("50.76") },
     { product: 'Handcrafted Wooden Chicken',
       sku: '2079',
       qty: 4,
       price: Decimal128("17.00"),
       cost: Decimal128("17") } ],
  total: Decimal128("500") }
{ _id: ObjectId("5dbe7a575368f69de2b4d50c"),
  street: '6159 Vandervort Camp',
  city: 'South Bobby',
  state: 'Montana',
  country: 'Guernsey',
  zip: '55141',
  phone: '173.672.8440 x661',
  name: 'Jovan Rice',
  userId: 3526,
  orderDate: 2019-09-14T21:05:45.049Z,
  status: 'shipping',
  shippingFee: Decimal128("9.00"),
  orderLines: 
   [ { product: 'Small Metal Sausages',
       sku: '8130',
       qty: 11,
       price: Decimal128("80.00"),
       cost: Decimal128("67.2") },
     { product: 'Intelligent Rubber Chicken',
       sku: '3775',
       qty: 61,
       price: Decimal128("10.00"),
       cost: Decimal128("8") },
     { product: 'Generic Rubber Table',
       sku: '7102',
       qty: 36,
       price: Decimal128("10.00"),
       cost: Decimal128("8.5") } ],
  total: Decimal128("100") }
......

$regex使用正则表达式匹配

json复制代码--以184开头的手机号的订单数量
db.orders.aggregate([
    {$match:{ phone: { $regex: /^184/ }}},
    {$count:"counts"}
 ]);
--结果
{"counts": 55}

$unwind
将数组拆分成单独的文档

格式

lua复制代码{
  $unwind:
  {
      path: <field path>,
      includeArrayIndex: <string>,  
      preserveNullAndEmptyArrays: <boolean> 
  } 
}

includeArrayIndex：可选,一个新字段的名称用于存放元素的数组索引。该名称不能以$开头。

preserveNullAndEmptyArrays：可选,默认为false，若为true,如果path没有对应的字段或者对应的数组size为0，则$unwind输出文档，默认false不输出。

json复制代码--筛选一条数据，将数组拆分
db.orders.aggregate([
    {$match:{_id:ObjectId("5dbe7aa650fc769de3e1b551")}},
    {$unwind:"$orderLines"},
 ]);
--结果
{ _id: ObjectId("5dbe7aa650fc769de3e1b551"),
  street: '3340 Marielle Manors',
  city: 'New Maymie',
  state: 'Connecticut',
  country: 'Malawi',
  zip: '22434-3104',
  phone: '184.544.4826 x4858',
  name: 'Annette Langworth',
  userId: 9830,
  orderDate: 2019-01-23T11:56:14.972Z,
  status: 'shipping',
  shippingFee: Decimal128("8.00"),
  orderLines: 
   { product: 'Sleek Granite Gloves',
     sku: '6176',
     qty: 31,
     price: Decimal128("74.00"),
     cost: Decimal128("71.04") },
  total: Decimal128("313") }
{ _id: ObjectId("5dbe7aa650fc769de3e1b551"),
  street: '3340 Marielle Manors',
  city: 'New Maymie',
  state: 'Connecticut',
  country: 'Malawi',
  zip: '22434-3104',
  phone: '184.544.4826 x4858',
  name: 'Annette Langworth',
  userId: 9830,
  orderDate: 2019-01-23T11:56:14.972Z,
  status: 'shipping',
  shippingFee: Decimal128("8.00"),
  orderLines: 
   { product: 'Licensed Soft Cheese',
     sku: '2702',
     qty: 70,
     price: Decimal128("55.00"),
     cost: Decimal128("53.9") },
  total: Decimal128("313") }
......

$project
从文档中指定想要的字段和不想要的字段

格式

{ $project: { <specification(s)> } }

shell复制代码specifications有以下形式：

<field>: <1 or true> 包含该字段
<field>: <0 or false> 不包含该字段

_id: <0 or false>        是否指定_id字段

json复制代码--如果有一个属性为或几个属性为1，那么只显示这一个或几个属性 + _id
db.orders.aggregate({
    $project:{name:1}
})
--结果
{ _id: ObjectId("5dbe7a545368f69de2b4d36e"), name: 'Destinee Schneider' }
{ _id: ObjectId("5dbe7a545368f69de2b4d36f"), name: 'Ashlynn Sipes' }
{ _id: ObjectId("5dbe7a54cd023b9de4efc1cc"), name: 'Genoveva Bauch' }
{ _id: ObjectId("5dbe7a542411dc9de6429190"), name: 'Kris Hansen' }

--如果有一个属性为或几个属性为0，那么显示除了这个一个或几个属性的其它所有属性
db.orders.aggregate({
    $project:{orderLines:0}
})
--结果
{
    "_id": {"$oid": "5dbe7a545368f69de2b4d36e"},
    "city": "Champlinberg",
    "country": "Malaysia",
    "name": "Destinee Schneider",
    "orderDate": {"$date": "2019-03-26T03:20:08.805Z"},
    "phone": "425.956.7743 x4621",
    "shippingFee": {"$numberDecimal": 8.00},
    "state": "Texas",
    "status": "created",
    "street": "493 Hilll Curve",
    "total": {"$numberDecimal": 407},
    "userId": 3573,
    "zip": "24344-1715"
},
{
    "_id": {"$oid": "5dbe7a545368f69de2b4d36f"},
    "city": "Linwoodburgh",
    "country": "United States of America",
    "name": "Ashlynn Sipes",
    "orderDate": {"$date": "2019-07-18T07:21:53.530Z"},
    "phone": "508.326.5494 x1218",
    "shippingFee": {"$numberDecimal": 7.00},
    "state": "Indiana",
    "status": "shipping",
    "street": "39476 Lacey Harbor",
    "total": {"$numberDecimal": 439},
    "userId": 2500,
    "zip": "84551"
}
......

--只展示嵌套属性
db.orders.aggregate({
    $project:{"orderLines.price":1}
})
或者
db.orders.aggregate({
    $project:{orderLines:{price:1}}
})
--结果
{ _id: ObjectId("5dbe7a542411dc9de6429193"),
  orderLines: 
   [ { price: Decimal128("75.00") },
     { price: Decimal128("64.00") },
     { price: Decimal128("34.00") },
     { price: Decimal128("98.00") },
     { price: Decimal128("88.00") },
     { price: Decimal128("20.00") },
     { price: Decimal128("59.00") },
     { price: Decimal128("20.00") },
     { price: Decimal128("90.00") },
     { price: Decimal128("45.00") },
     { price: Decimal128("42.00") },
     { price: Decimal128("28.00") } ] }
{ _id: ObjectId("5dbe7a5450fc769de3e19d20"),
  orderLines: 
   [ { price: Decimal128("51.00") },
     { price: Decimal128("10.00") },
     { price: Decimal128("63.00") },
     { price: Decimal128("12.00") },
     { price: Decimal128("37.00") },
     { price: Decimal128("43.00") },
     { price: Decimal128("39.00") },
     { price: Decimal128("68.00") },
     { price: Decimal128("21.00") } ] }
......

$cond-if-then-else的使用相当于SQL中的case-when-then-else

$$REMOVE是在满足这个条件的时候移除这个属性

json复制代码--不是7月的文档，移除这个属性
db.orders.aggregate({
    $project:{
        name:1,
        orderDate:{
            $cond: {
                   if: { $ne: [ {"$month":"$orderDate"}, 7 ] },
                   then: "$$REMOVE",
                   else: "$orderDate"
            }
        }
    }
})
--结果
{ _id: ObjectId("5dbe7a545368f69de2b4d36e"), name: 'Destinee Schneider' }
{ _id: ObjectId("5dbe7a545368f69de2b4d36f"), name: 'Ashlynn Sipes', orderDate: 2019-07-18T07:21:53.530Z }
{ _id: ObjectId("5dbe7a54cd023b9de4efc1cc"), name: 'Genoveva Bauch' }
{ _id: ObjectId("5dbe7a542411dc9de6429190"), name: 'Kris Hansen' }

映射到一个属性包含多个属性

json复制代码--使用substr截取第一个字母，使用strLenCP取name的长度
db.orders.aggregate({
    $project:{
        _id: 0,
        nm:{
            name:"$name",
            firstLetter:{$substr:["$name",0,1]},
            nameLenth:{$strLenCP:"$name"}
        }
    }
})
--结果
{ nm: { name: 'Destinee Schneider', firstLetter: 'D', nameLenth: 18 } }
{ nm: { name: 'Ashlynn Sipes', firstLetter: 'A', nameLenth: 13 } }
{ nm: { name: 'Genoveva Bauch', firstLetter: 'G', nameLenth: 14 } }
{ nm: { name: 'Kris Hansen', firstLetter: 'K', nameLenth: 11 } }
{ nm: { name: 'Dudley Kertzmann', firstLetter: 'D', nameLenth: 16 } }
......

将多个属性的值映射到一个数组当中

json复制代码db.orders.aggregate({
    $project:{
        _id: 0,
        msg:[
            "$name","$orderDate","$orderLines.price"
        ]
    }
})
--结果
{msg: 
   [ 
     'Gudrun Stamm',
     2019-09-10T01:00:00.679Z,
     [ 
       Decimal128("17.00"),
       Decimal128("91.00"),
       Decimal128("51.00"),
       Decimal128("10.00"),
       Decimal128("18.00"),
       Decimal128("46.00"),
       Decimal128("69.00"),
       Decimal128("18.00"),
       Decimal128("89.00"),
       Decimal128("99.00") 
	] 
  ] 
}
{ msg: 
   [ 
     'Jalon Erdman',
     2019-03-06T08:30:55.042Z,
     [ 
       Decimal128("37.00"),
       Decimal128("91.00"),
       Decimal128("88.00"),
       Decimal128("20.00"),
       Decimal128("75.00"),
       Decimal128("46.00") 
	] 
  ] 
}
{ msg: 
   [ 
     'Mossie Ankunding',
     2019-05-25T09:40:13.662Z,
     [ 
       Decimal128("14.00"),
       Decimal128("49.00"),
       Decimal128("38.00"),
       Decimal128("55.00"),
       Decimal128("20.00") 
	 ] 
   ] 
}
{ msg: 
   [ 
     'Jorge Toy',
     2019-09-28T23:07:35.137Z,
     [ 
       Decimal128("71.00"),
       Decimal128("62.00"),
       Decimal128("59.00"),
       Decimal128("43.00"),
       Decimal128("55.00"),
       Decimal128("65.00"),
       Decimal128("57.00") 
     ] 
   ] 
}
......

$limit
限制条数，获取前n条数据

json复制代码db.orders.aggregate({
    $limit:2
})
--结果
{ _id: ObjectId("5dbe7a545368f69de2b4d36e"),
  street: '493 Hilll Curve',
  city: 'Champlinberg',
  state: 'Texas',
  country: 'Malaysia',
  zip: '24344-1715',
  phone: '425.956.7743 x4621',
  name: 'Destinee Schneider',
  userId: 3573,
  orderDate: 2019-03-26T03:20:08.805Z,
  status: 'created',
  shippingFee: Decimal128("8.00"),
  orderLines: 
   [ { product: 'Refined Fresh Tuna',
       sku: '2057',
       qty: 25,
       price: Decimal128("56.00"),
       cost: Decimal128("46.48") },
     { product: 'Intelligent Wooden Towels',
       sku: '5674',
       qty: 72,
       price: Decimal128("84.00"),
       cost: Decimal128("68.88") },
     { product: 'Refined Steel Bacon',
       sku: '5009',
       qty: 8,
       price: Decimal128("53.00"),
       cost: Decimal128("50.35") } ],
  total: Decimal128("407") }
{ _id: ObjectId("5dbe7a545368f69de2b4d36f"),
  street: '39476 Lacey Harbor',
  city: 'Linwoodburgh',
  state: 'Indiana',
  country: 'United States of America',
  zip: '84551',
  phone: '508.326.5494 x1218',
  name: 'Ashlynn Sipes',
  userId: 2500,
  orderDate: 2019-07-18T07:21:53.530Z,
  status: 'shipping',
  shippingFee: Decimal128("7.00"),
  orderLines: 
   [ { product: 'Fantastic Soft Soap',
       sku: '6274',
       qty: 71,
       price: Decimal128("91.00"),
       cost: Decimal128("89.18") },
     { product: 'Intelligent Steel Chair',
       sku: '8278',
       qty: 13,
       price: Decimal128("67.00"),
       cost: Decimal128("62.31") },
     { product: 'Small Rubber Shoes',
       sku: '3534',
       qty: 60,
       price: Decimal128("76.00"),
       cost: Decimal128("71.44") } ],
  total: Decimal128("439") }
......

$skip
跳过前n行数据查询

json复制代码--查询第2、3条
db.orders.aggregate([{
        $skip:1
    },{
        $limit:2
}])
--结果
{ _id: ObjectId("5dbe7a545368f69de2b4d36f"),
  street: '39476 Lacey Harbor',
  city: 'Linwoodburgh',
  state: 'Indiana',
  country: 'United States of America',
  zip: '84551',
  phone: '508.326.5494 x1218',
  name: 'Ashlynn Sipes',
  userId: 2500,
  orderDate: 2019-07-18T07:21:53.530Z,
  status: 'shipping',
  shippingFee: Decimal128("7.00"),
  orderLines: 
   [ { product: 'Fantastic Soft Soap',
       sku: '6274',
       qty: 71,
       price: Decimal128("91.00"),
       cost: Decimal128("89.18") },
     { product: 'Incredible Concrete Chips',
       sku: '3756',
       qty: 6,
       price: Decimal128("18.00"),
       cost: Decimal128("15.12") },
     { product: 'Intelligent Steel Chair',
       sku: '8278',
       qty: 13,
       price: Decimal128("67.00"),
       cost: Decimal128("62.31") },
     { product: 'Small Rubber Shoes',
       sku: '3534',
       qty: 60,
       price: Decimal128("76.00"),
       cost: Decimal128("71.44") } ],
  total: Decimal128("439") }
{ _id: ObjectId("5dbe7a54cd023b9de4efc1cc"),
  street: '699 Harvey Row',
  city: 'Electamouth',
  state: 'South Dakota',
  country: 'Burundi',
  zip: '61826',
  phone: '(936) 449-4255 x58095',
  name: 'Genoveva Bauch',
  userId: 8302,
  orderDate: 2019-03-15T13:53:48.925Z,
  status: 'shipping',
  shippingFee: Decimal128("5.00"),
  orderLines: 
   [ { product: 'Intelligent Soft Salad',
       sku: '3711',
       qty: 85,
       price: Decimal128("86.00"),
       cost: Decimal128("76.54") },
     { product: 'Generic Cotton Ball',
       sku: '2112',
       qty: 44,
       price: Decimal128("21.00"),
       cost: Decimal128("19.32") },
     { product: 'Rustic Plastic Keyboard',
       sku: '6451',
       qty: 19,
       price: Decimal128("81.00"),
       cost: Decimal128("77.76") } ],
  total: Decimal128("341") }

$sort
对文档进行排序升序:1 降序:-1

json复制代码--用名字排序
db.orders.aggregate([{
        $sort:{name:1}
    },{
        $project:{_id:0,name:1}
    }
])
--结果
{ name: 'Aaliyah Bruen' }
{ name: 'Aaliyah Erdman' }
{ name: 'Aaliyah Fahey' }
{ name: 'Aaliyah Gerhold' }
{ name: 'Aaliyah Graham' }
{ name: 'Aaliyah Greenfelder' }
{ name: 'Aaliyah Konopelski' }
{ name: 'Aaliyah Kovacek' }
{ name: 'Aaliyah Kuphal' }
{ name: 'Aaliyah Lueilwitz' }
{ name: 'Aaliyah Maggio' }
......

$sortByCount
根据某个字段分组计算 count 值然后按照这个值降序排序

json复制代码db.orders.aggregate({
    $sortByCount:"$status"
})
--结果
{ _id: 'created', count: 20087 }
{ _id: 'shipping', count: 20017 }
{ _id: 'completed', count: 20015 }
{ _id: 'cancelled', count: 19978 }
{ _id: 'fulfilled', count: 19903 }

本文转载自: 掘金

开发者博客 – 和开发相关的这里全都有

MongoDB之路-聚合操作

mongo的聚合操作和mysql的查询类比

MongoDB聚合(Aggregate)

`$count`

`$group`

`$match`

`$unwind`

`$project`

`$limit`

`$skip`

`$sort`

`$sortByCount`