Millet Porridge

English version of https://corvo.myseu.cn

0%

Use Python AST to analysis expression and sort by weight

Code in this blog has been tested in Python3.7.

Problem

In a comment’s system, the operator need to dynamic adjust the factor to influence the weight of the comment, and then sort the comments by the weights.

For example, there are many comments in this data structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import datetime
topics = [
{
"content": "这是评论1", # 评论内容
"likes": 10, # 点赞数
"user_weight": 10, # 用户权重
'time': datetime.datetime(2019, 2, 20, 19, 31, 58, 412000), # 发布时间
},
{
"content": "这是评论2", # 评论内容
"likes": 5, # 点赞数
"user_weight": 20, # 用户权重
'time': datetime.datetime(2019, 2, 20, 19, 41, 58, 412000), # 发布时间
},
{
"content": "这是评论1", # 评论内容
"likes": 20, # 点赞数
"user_weight": 5, # 用户权重
'time': datetime.datetime(2019, 2, 20, 20, 1, 58, 412000), # 发布时间
}
]

We want to use an algorithm to sort the comments by the weights, the algorithm is:

1
user_weight * 5 + likes * 1 + posted_seconds * -0.02

The user_weight is the most important factor, and the likes is the second most important factor.

A simple solution

At least, you may introduced an algorithm using for-loop to calculate the comments weights.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import time
# we use this data to represent the current time
cur_time = 1550664218.0

for topic in topics:
topic['weight'] = 5 * topic['user_weight'] + topic['likes'] + \
(cur_time - time.mktime(topic['time'].timetuple())) * -0.02

# Than you can sort by python sort function.

[{'content': '这是评论1',
'likes': 10,
'user_weight': 10,
'time': datetime.datetime(2019, 2, 20, 19, 31, 58, 412000),
'weight': 22.0},
{'content': '这是评论2',
'likes': 5,
'user_weight': 20,
'time': datetime.datetime(2019, 2, 20, 19, 41, 58, 412000),
'weight': 79.0},
{'content': '这是评论1',
'likes': 20,
'user_weight': 5,
'time': datetime.datetime(2019, 2, 20, 20, 1, 58, 412000),
'weight': 43.0}]

A better solution

Sometimes, the operator want to modify the algorithm, for example: 1. modify the factor user_weight to be user_weight * 2, and the factor likes to be likes * 2. 2. Create a new factor specify the expression for every topic.

In this time, the old for-loop function can not work anymore. We have to refactor the implementation. The new function need input an expression, and a list of topics, then return the list of topics with the weights.

How to parse the expression

In my college life, we have been taught with Reverse Polish Notation to get the value of expression. And we also learned Principles of Compiler Design. we can parse the expression to an Abstruct Syntax Tree (AST).

AST

This picture is the AST of the expression (a+b)*(c-d)). Once we were being told the exact value of every variable, we could get the result.

Well, it seems we don’t need to parse the expression by ourselves, because the Python has a built-in module called ast.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import ast
import operator

BINOPS = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
ast.Mod: operator.mod,
}
_node = ast.parse('(a+b)*(c-d)', mode='eval')

_data = {
'a': 2,
'b': 3,
'c': 4,
'd': 5,
}

def make_calculate(node, data):
def _eval(node):
if isinstance(node, ast.Expression):
return _eval(node.body)
elif isinstance(node, ast.Name): # Get the name `a, b, c, d`, return value
return data[node.id]
elif isinstance(node, ast.Num): # Get the number
return node.n
elif isinstance(node, ast.BinOp): # Get the binary operator, + - * /
return BINOPS[type(node.op)](_eval(node.left), _eval(node.right))
else:
raise Exception('Unsupported type {}'.format(node))
return _eval(node)

make_calculate(_node, _data) # The result is -5

The final solution

We could ask user to give us a simple expression, and then we can get the weight by this expression for every topic.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
weight_expr = """
'user_weight' * 5 + 'likes' + (0 - 1) * 0.02 * 'time_offset'
"""

# Get the object property value
def get_user_weight(topic):
return topic['user_weight']

def get_likes(topic):
return topic['likes']

def get_time_offset(topic):
cur_time = 1550664218.0 # 实际中应该使用 time.time()
return cur_time - time.mktime(topic['time'].timetuple())

# 存储了一些函数指针
TOPIC_UNIT = {
'likes': get_likes,
'user_weight': get_user_weight,
'time_offset': get_time_offset,
}

# Build AST
_node = ast.parse(weight_expr, mode='eval')

def make_calculate(node, topic):

def _eval(node):
if isinstance(node, ast.Expression):
return _eval(node.body)
elif isinstance(node, ast.Str):
# node.s will be 'user_weight', 'likes', or 'time_offset',
# we can use get_user_weight, get_likes, to get the topic property value
return TOPIC_UNIT[node.s](topic)
elif isinstance(node, ast.Num):
return node.n
elif isinstance(node, ast.BinOp):
return BINOPS[type(node.op)](_eval(node.left), _eval(node.right))
else:
raise Exception('Unsupported type {}'.format(node))
return _eval(node.body)


for _topic in topics:
_topic['weight'] = make_calculate(_node, _topic)

Summary

What we have done to simplify the real problem: we introduced a new input for operator to let him/her to decide the topic weight, and even more, we decoupled the topic weight calculation from the topic data. This could help a lot when we need to add a new property or modify the expression.