1. 程式人生 > >Neo4j 做推薦 (11)—— 協同過濾(皮爾遜相似性)

Neo4j 做推薦 (11)—— 協同過濾(皮爾遜相似性)

皮爾遜相似性或皮爾遜相關性是我們可以使用的另一種相似度量。這特別適合產品推薦,因為它考慮到不同使用者將具有不同的平均評分這一事實:平均而言,一些使用者傾向於給出比其他使用者更高的評分。由於皮爾遜相似性考慮了均值的差異,因此該指標將解釋這些差異。

http://guides.neo4j.com/sandbox/recommendations/img/pearson.png

根據皮爾遜的相似度,找到與Cynthia Freeman最相似的使用者

MATCH (u1:User {name:"Cynthia Freeman"})-[r:RATED]->(m:Movie)
WITH u1, avg(r.rating) AS u1_mean

MATCH (u1)-[r1:RATED]->(m:Movie)<-[r2:RATED]-(u2)
WITH u1, u1_mean, u2, COLLECT({r1: r1, r2: r2}) AS ratings WHERE size(ratings) > 10

MATCH (u2)-[r:RATED]->(m:Movie)
WITH u1, u1_mean, u2, avg(r.rating) AS u2_mean, ratings

UNWIND ratings AS r

WITH sum( (r.r1.rating-u1_mean) * (r.r2.rating-u2_mean) ) AS nom,
     sqrt( sum( (r.r1.rating - u1_mean)^2) * sum( (r.r2.rating - u2_mean) ^2)) AS denom,
     u1, u2 WHERE denom <> 0

RETURN u1.name, u2.name, nom/denom AS pearson
ORDER BY pearson DESC LIMIT 100