统计活动参与名单,看看是什么样的骚操作来实现的

本文已参与好文召集令活动,点击查看:后端、大前端双赛道投稿,2万元奖池等你挑战!

介绍

前段时间做出来一个活动排名,虽然有很多不足的地方,但是依然收获很多好评

image.png

有些小伙伴很好奇,怎么实现的,由于代码耦合性比较强,开始的时候也不知道怎么写这篇文章,一直没发

最近比较忙,也没及时修复bug、添加新功能,所以决定,开源出来,让大家一起舔砖加瓦,将功能完善起来

欢迎各位大佬来贡献代码
项目地址:github.com/ytwp/juejin…

这篇文章主要是讲一下活动排名怎么实现的

正题

  1. 要做这个功能,一定离不开用户,第一步就是发现用户
* 目前已通过专栏,以及每个标签下的最新文章,发现用户,收集用户ID
* 过滤不活跃用户,降低请求数量
  1. 通过定时,查询一下用户的信息,然后保存起来,相当于拍了个快照
  2. 查询用户时,把近一个月的文章,也查询出来,给这个功能做数据支撑
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
java复制代码截取部分核心源码
public void run() {
log.info("拉取用户快照");
String now = LocalDateTime.now().format(yyyyMMddHH);
try {
String path = "./j-" + LocalDate.now().format(yyyyMMdd) + ".json";
FileUtil.initFile(path);
FileWriter fw = new FileWriter(path, true);
PrintWriter pw = new PrintWriter(fw);

int i = 0;
//遍历所有用户 然后去获取用户信息
//获取到后,输出到文件里,用于后边的计算
for (String userId : userIdSet) {
//获取用户信息
JueJInApi.UserData userData = JueJInApi.getUser(userId);
if (userData == null) {
userData = JueJInApi.getUser(userId);
}
if (userData != null) {
userData.setTime(now);
pw.println(JSONUtil.toJsonStr(userData));
}
log.info((++i) + " 用户快照:" + userId);
}
pw.close();
fw.close();
} catch (IOException e) {
e.printStackTrace();
}
log.info("拉取用户快照结束:" + now);
}

public static UserData getUser(String user_id) {
Map<String, Object> map1 = new HashMap<>();
map1.put("cursor", "0");
map1.put("sort_type", 2);
map1.put("user_id", user_id);
Map<String, Object> map2 = new HashMap<>();
map2.put("audit_status", null);
map2.put("cursor", "0");
map2.put("limit", 10);
map2.put("user_id", user_id);
try {
//由于每篇文章都包含用户信息,所以直接去拉取文章就行
//每次10条,直到拉取完45天内的所有文章
List<ArticleData> articleDataList = new ArrayList<>();
for (Integer cursor = 0; true; cursor += 10) {
map1.put("cursor", cursor.toString());
String res1 = HttpUtil.post("https://api.juejin.cn/content_api/v1/article/query_list", JSONUtil.toJsonStr(map1));
JSONObject jsonObject = JSONUtil.parseObj(res1);
Integer count = jsonObject.getInt("count");
if (count == 0) {
break;
}
JSONArray data1 = jsonObject.getJSONArray("data");
List<ArticleData> dataList = JSONUtil.toList(data1, ArticleData.class);
//过滤超过45天的文章
List<ArticleData> data2 = dataList.stream().filter(data -> {
String ctime = data.getArticle_info().getCtime();
long now = System.currentTimeMillis() / 1000;
return Long.parseLong(ctime) > (now - (60 * 60 * 24 * 45));
}).collect(Collectors.toList());

articleDataList.addAll(data2);

//如果有超过45天的文章,就结束
if (dataList.size() != data2.size()) {
break;
}
//如果拉完了所有文章也结束
if (jsonObject.getInt("cursor") >= (count)) {
break;
}
}
//这里是拿一个文章包含的用户信息,然后把其他的用户信息都设置null 防止占有大量硬盘
AtomicReference<AuthorUserInfo> author_user_info = new AtomicReference<>();
articleDataList = articleDataList.stream().peek(articleData -> {
author_user_info.set(articleData.getAuthor_user_info());
articleData.setAuthor_user_info(null);
}).collect(Collectors.toList());
//这里是这个用户的所以专栏,用于专栏统计
List<SelfData> selfDataList = new ArrayList<>();
for (Integer cursor = 0; true; cursor += 10) {
map2.put("cursor", cursor.toString());
String res2 = HttpUtil.post("https://api.juejin.cn/content_api/v1/column/self_center_list", JSONUtil.toJsonStr(map2));
JSONObject jsonObject = JSONUtil.parseObj(res2);
Integer count = jsonObject.getInt("count");
if (count == 0) {
break;
}
JSONArray data2 = jsonObject.getJSONArray("data");
selfDataList.addAll(JSONUtil.toList(data2, SelfData.class));
if (jsonObject.getInt("cursor") >= (count)) {
break;
}
}
//包装用户信息
UserData userData = new UserData();
userData.setUser_id(user_id);
userData.setArticle_list(articleDataList);
userData.setSelf_center_list(selfDataList);
userData.setAuthor_user_info(author_user_info.get());
return userData;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
  1. 分析数据
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
sclala复制代码// 我感觉这代码,以及可以称为,最佳迷惑代码,自己都快解释不清了
def run(): Unit = {
log.info("计算活动文章")
//各种时间
val now = LocalDateTime.now.format(yyyyMMddHH)
val runTime = System.currentTimeMillis() / 1000
val yyyyMMddStr = LocalDate.now.format(yyyyMMdd)
val yyyyMMddInt = Integer.parseInt(yyyyMMddStr)
try {
//输出路径
val path = FilePathConstant.EXPLORE_DARA_PATH.format(yyyyMMddStr)
val outPath = FilePathConstant.BAK_ACTIVITY_REPORT_DARA_PATH.format(now.format(yyyyMMdd))
val outNowPath = FilePathConstant.ACTIVITY_REPORT_DARA_PATH
val rulePath = FilePathConstant.ACTIVITY_RULE_PATH
val configPath = FilePathConstant.ACTIVITY_CONFIG_PATH

//读取所需要的配置文件
val userDataList = FileUtil.readLineJsonList(path, classOf[JueJInApi.UserData]).asScala
val activityRuleList = FileUtil.readJsonList(rulePath, classOf[ActivityRule]).asScala
val activityConfig = FileUtil.readJson(configPath, classOf[ActivityConfig])

//防止大量调用掘金的接口,记录一下上次处理完的时间,之前的不做处理
val lastRunTime = activityConfig.getLastRunTime

log.info(s"数据读取完成,长度:${userDataList.size}")

log.info("开始计算 活动文章")
/**
* 专栏
* 1.用户分组
* 2.保留一天最新的一个快照
* 3.关注数排序
*/
val resMap = mutable.Map[Int, ListBuffer[(ActivityRule, ArticleData)]]()
// 这里是 groupBy values map 是为了拿到一天内每个用户最新的一条数据
userDataList
.groupBy(_.getUser_id)
.values
.map(userData => (userData.maxBy(_.getTime.toInt)))
.foreach(userData => {
//开始统计
userData.getArticle_list.asScala
.foreach(articleData => {
val ctime = articleData.getArticle_info.getCtime.toLong
//首次 或者 最后一次运行-12小时
if (lastRunTime == 0 || ctime > (lastRunTime - 60 * 60 * 12)) {
log.info(s"拉取文章详情: ${articleData.getArticle_id}")
//获取文章信息,匹配是不是活动文章
val detail = JueJInApi.getArticleDetail(articleData.getArticle_id)
if (detail != null) {
activityRuleList.foreach(activityRule => {
//匹配到活动文章
def isActivity(content: String) {
if (content.contains(activityRule.getKeyword)) {
log.info(s"匹配到活动文章: ${activityRule.getKeyword}")
val listBuffer = resMap.getOrElse(activityRule.getId.toInt, ListBuffer[(ActivityRule, ArticleData)]())
articleData.setAuthor_user_info(detail.getAuthor_user_info)
listBuffer.append((activityRule, articleData))
//添加到最终的结果集
resMap.put(activityRule.getId.toInt, listBuffer)
}
}
//活动结束不在统计
if (activityRule.getEndDate <= yyyyMMddInt) {
//看看是通过标题 还是文章 匹配
if (activityRule.getType == "post") {
isActivity(detail.getArticle_info.getMark_content)
} else if (activityRule.getType == "title") {
isActivity(detail.getArticle_info.getTitle)
}
}
})
}
}
})
})
//拿个(用户id,用户信息) 的一个map
val userDataMap: Map[String, JueJInApi.UserData] = userDataList
.groupBy(_.getUser_id)
.values
.map(userData => (userData.maxBy(_.getTime.toInt)))
.map(userData => (userData.getUser_id, userData))
.toMap

//读取上次的结果
var fileActivityReportList: mutable.Buffer[ActivityReport] = FileUtil.readJsonList(outNowPath, classOf[ActivityReport]).asScala
log.info("初始化 活动 列表")
//防止新加活动,所以要把活动初始化到结果的json里
val ids = fileActivityReportList.map(_.getId)
activityRuleList.foreach(rule => {
val id = rule.getId
if (!ids.contains(id)) {
val report = new ActivityReport
report.setId(id)
report.setUserActivityReportMap(new util.HashMap[String, ActivityReport.UserActivityReport]())
fileActivityReportList = fileActivityReportList.+:(report)
}
//结束7天的 直接删除
val end7Date = LocalDate.parse(rule.getEndDate.toString, yyyyMMdd).plusDays(7).format(yyyyMMdd).toInt
if (yyyyMMddInt > end7Date) {
fileActivityReportList = fileActivityReportList.filter(r => r.getId != id)
}
})
//活动匹配规则 转成 (活动id,活动详情)的一个map
val ruleMap: Map[Integer, ActivityRule] = activityRuleList.map(rule => (rule.getId, rule)).toMap

log.info("转换数据")
val activityReportList = fileActivityReportList
.map(activityReport => {
val id = activityReport.getId
val ruleOption: Option[ActivityRule] = ruleMap.get(id)
val rule = ruleOption.get
//活动过期后不更新数据
if (rule == null || rule.getEndDate <= yyyyMMddInt) {
null
} else {
//封装好用户数据
val activityReportMap = activityReport.getUserActivityReportMap
val articleIdSet = activityReportMap.asScala.flatMap(a => a._2.getArticleIdSet.asScala).toSet
val option = resMap.get(id)
if (option.nonEmpty) {
option.get.foreach(t => {
val value = t._2
val article_id = value.getArticle_id
if (!articleIdSet.contains(article_id)) {
val user_id = value.getAuthor_user_info.getUser_id
val user_name = value.getAuthor_user_info.getUser_name
var report = activityReportMap.get(user_id)
if (report == null) {
report = new ActivityReport.UserActivityReport()
report.setArticleIdSet(new util.HashSet[String]())
report.setUser_id(user_id)
report.setUser_name(user_name)
report.setCount(0)
report.setSum_digg_count(0)
report.setSum_view_count(0)
report.setSum_collect_count(0)
report.setSum_comment_count(0)
}
val set = report.getArticleIdSet
set.add(article_id)
report.setArticleIdSet(set)
activityReportMap.put(user_id, report)
}
})

}
activityReport.setUserActivityReportMap(activityReportMap)
activityReport
}
})
//过滤掉空结果
.filter(a => a != null)
//生成最终的数据集
.map(activityReport => {
val activityReportMap = activityReport.getUserActivityReportMap.asScala.map(t => {
val user_id = t._1
val userActivityReport = t._2
val set = userActivityReport.getArticleIdSet
val maybeUserData = userDataMap.get(user_id)
if (maybeUserData.nonEmpty) {
val userData = maybeUserData.get
val datas = userData.getArticle_list.asScala.filter(article => set.contains(article.getArticle_id))
userActivityReport.setCount(datas.size)
userActivityReport.setSum_view_count(datas.map(_.getArticle_info.getView_count.toInt).sum)
userActivityReport.setSum_digg_count(datas.map(_.getArticle_info.getDigg_count.toInt).sum)
userActivityReport.setSum_collect_count(datas.map(_.getArticle_info.getCollect_count.toInt).sum)
userActivityReport.setSum_comment_count(datas.map(_.getArticle_info.getComment_count.toInt).sum)
}
(user_id, userActivityReport)
}).toMap.asJava
activityReport.setUserActivityReportMap(activityReportMap)
activityReport
})
.asJava

log.info("保存数据")
activityConfig.setLastRunTime(runTime)
FileUtil.writeJson(outPath, activityReportList)
FileUtil.writeJson(outNowPath, activityReportList)
FileUtil.writeJson(configPath, activityConfig)
reportService.updateActivityReport(activityReportList, activityRuleList)

} catch {
case e: IOException =>
e.printStackTrace()
}
log.info("计算活动文章结束:" + now)
}

本文转载自: 掘金

开发者博客 – 和开发相关的 这里全都有

0%