概述
随机游走(Random Walk)从图中的某个节点开始,随机移动至其中一个邻居节点;这个过程通常会重复进行预设的次数。这个概念最早由英国数学家和生物统计学家Karl Pearson于1905年引入,自那时以来,它已经成为了研究各种系统的基石,而不仅限于图论领域。
- K. Pearson, The Problem of the Random Walk (1905)
基本概念
随机游走
随机游走是一种数学模型,用于模拟以随机或不可预测的方式进行的一系列步骤,类似于一个醉汉酒后乱步所形成路径。
基本的随机游走在一维空间中进行:一个节点从数轴的原点开始,每次向上或向下移动一个单位,移动的方向概率相等。一个10步随机游走的示例如下:

以下是多次执行这种随机游走的示例,每次游走100步:

图中的随机游走
在图中,随机游走从一个节点开始,并依次经过相邻节点形成路径。这个过程由游走深度控制,该深度确定要访问的节点数。
嬴图的随机游走算法实现的是经典随机游走。默认情况下,每条边被赋予相同的权重(等于1),从而遍历每条边的概率相等。当指定边权重时,遍历每条边的概率与其权重成正比。需要注意的是,随机游走存在多种变体,例如Node2Vec游走和Struc2Vec游走。
特殊说明
- 节点可以沿自环边游走。
- 随机游走无法从一个孤点开始,因为没有相邻的边可以继续前进。
- 随机游走的结果与边的方向无关。
示例图

在一个空图中运行以下语句定义图结构并插入数据:
ALTER EDGE default ADD PROPERTY {
  score float
};
INSERT (A:default {_id: "A"}),
       (B:default {_id: "B"}),
       (C:default {_id: "C"}),
       (D:default {_id: "D"}),
       (E:default {_id: "E"}),
       (F:default {_id: "F"}),
       (G:default {_id: "G"}),
       (H:default {_id: "H"}),
       (I:default {_id: "I"}),
       (J:default {_id: "J"}),
       (K:default {_id: "K"}),
       (A)-[:default {score: 1}]->(B),
       (A)-[:default {score: 3}]->(C),
       (C)-[:default {score: 1.5}]->(D),
       (D)-[:default {score: 2.4}]->(C),
       (D)-[:default {score: 5}]->(F),
       (E)-[:default {score: 2.2}]->(C),
       (E)-[:default {score: 0.6}]->(F),
       (F)-[:default {score: 1.5}]->(G),
       (G)-[:default {score: 2}]->(J),
       (H)-[:default {score: 2.5}]->(G),
       (H)-[:default {score: 1}]->(I),
       (I)-[:default {score: 3.1}]->(I),
       (J)-[:default {score: 2.6}]->(G);
create().edge_property(@default, "score", float);
insert().into(@default).nodes([{_id:"A"},{_id:"B"},{_id:"C"},{_id:"D"},{_id:"E"},{_id:"F"},{_id:"G"},{_id:"H"},{_id:"I"},{_id:"J"},{_id:"K"}]);
insert().into(@default).edges([{_from:"A", _to:"B", score:1}, {_from:"A", _to:"C", score:3}, {_from:"C", _to:"D", score:1.5}, {_from:"D", _to:"C", score:2.4}, {_from:"D", _to:"F", score:5}, {_from:"E", _to:"C", score:2.2}, {_from:"E", _to:"F", score:0.6}, {_from:"F", _to:"G", score:1.5}, {_from:"G", _to:"J", score:2}, {_from:"H", _to:"G", score:2.5}, {_from:"H", _to:"I", score:1}, {_from:"I", _to:"I", score:3.1}, {_from:"J", _to:"G", score:2.6}]);
创建HDC图
将当前图集全部加载到HDC服务器hdc-server-1上,并命名为 my_hdc_graph:
CREATE HDC GRAPH my_hdc_graph ON "hdc-server-1" OPTIONS {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static"
}
hdc.graph.create("my_hdc_graph", {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static"
}).to("hdc-server-1")
参数
算法名:random_walk
| 参数名 | 类型 | 规范 | 默认值 | 可选 | 描述 | 
|---|---|---|---|---|---|
| ids | [] _id | / | / | 是 | 通过 _id指定随机游走的起点;若未设置则计算所有点 | 
| uuids | [] _uuid | / | / | 是 | 通过 _uuid指定随机游走的起点;若未设置则计算所有点 | 
| walk_length | Integer | ≥1 | 1 | 是 | 每次游走的深度,即访问的节点数量 | 
| walk_num | Integer | ≥1 | 1 | 是 | 从每个指定节点开始的游走次数 | 
| edge_schema_property | []" <@schema.?><property>" | / | / | 是 | 作为权重的数值类型边属性,权重值为所有指定属性值的总和;不包含指定属性的边将被忽略 | 
| return_id_uuid | String | uuid,id,both | uuid | 是 | 在结果中使用 _uuid、_id或同时使用两者来表示点 | 
| limit | Integer | ≥-1 | -1 | 是 | 限制返回的结果数; -1返回所有结果 | 
文件回写
CALL algo.random_walk.write("my_hdc_graph", {
  return_id_uuid: "id",
  walk_length: 6,
  walk_num: 2
}, {
  file: {
    filename: "walks"
  }
})
algo(random_walk).params({
  projection: "my_hdc_graph",
  return_id_uuid: "id",
  walk_length: 6,
  walk_num: 2
}).write({
  file:{
    filename: 'walks'
}})
结果:
_ids
J,G,H,G,F,D,
D,C,D,C,A,C,
F,G,H,I,I,I,
H,G,H,I,H,G,
B,A,C,E,C,D,
A,C,D,C,D,C,
E,C,E,F,E,C,
C,D,C,E,F,D,
I,I,I,H,G,J,
G,J,G,J,G,H,
J,G,J,G,F,E,
D,C,E,C,D,F,
F,D,C,A,B,A,
H,I,I,I,H,I,
B,A,B,A,C,E,
A,C,D,C,A,B,
E,F,G,F,D,F,
C,E,F,E,F,D,
I,I,H,I,I,I,
G,H,I,I,H,I,
完整返回
CALL algo.random_walk.run("my_hdc_graph", {
  return_id_uuid: "id",
  walk_length: 6,
  walk_num: 2,
  edge_schema_property: 'score'
}) YIELD walks
RETURN walks
exec{
  algo(random_walk).params({
    return_id_uuid: "id",
    walk_length: 6,
    walk_num: 2,
    edge_schema_property: 'score'
  }) as walks
  return walks
} on my_hdc_graph
结果:
| _ids | 
|---|
| ["J","G","J","G","J","G"] | 
| ["D","F","E","C","E","C"] | 
| ["F","D","F","D","F","G"] | 
| ["H","I","I","I","I","H"] | 
| ["B","A","C","A","C","D"] | 
| ["A","C","A","B","A","B"] | 
| ["E","C","E","F","D","C"] | 
| ["C","A","C","D","F","D"] | 
| ["I","H","I","I","I","I"] | 
| ["G","H","G","J","G","J"] | 
| ["J","G","J","G","J","G"] | 
| ["D","F","D","C","E","C"] | 
| ["F","D","C","D","C","E"] | 
| ["H","I","H","G","J","G"] | 
| ["B","A","C","D","F","G"] | 
| ["A","C","D","C","A","C"] | 
| ["G","J","G","F","D","F"] | 
| ["H","I","I","I","I","H"] | 
| ["F","D","F","D","F","G"] | 
| ["D","F","E","C","E","C"] | 
| ["J","G","J","G","J","G"] | 
流式返回
CALL algo.random_walk.stream("my_hdc_graph", {
  return_id_uuid: "id",
  walk_length: 5,
  walk_num: 1,
  edge_schema_property: '@default.score'
}) YIELD walks
RETURN walks
exec{
  algo(random_walk).params({
    return_id_uuid: "id",
    walk_length: 5,
    walk_num: 1,
    edge_schema_property: '@default.score'
  }).stream() as walks
  return walks
} on hdc_randomWalk
结果:
| _ids | 
|---|
| ["J","G","J","G","J"] | 
| ["D","F","G","J","G"] | 
| ["F","G","F","D","C"] | 
| ["H","G","H","G","J"] | 
| ["B","A","C","D","F"] | 
| ["A","C","A","C","A"] | 
| ["E","F","D","F","D"] | 
| ["C","D","F","D","F"] | 
| ["I","I","I","I","I"] | 
| ["G","H","G","J","G"] | 
