A good hash is hard to find

阿新 • • 發佈：2018-12-29

A good hash is hard to find

I’m sure anyone who has worked with an A/B testing framework will nod knowingly when I say, the devil is in the details. When you’re looking for small signals in noisy human behavior, the tiniest bit of bias or extra variance is amplified and easily leads to false positives or missed opportunities. Recently, at OfferUp, we discovered that something as simple as choosing the right hash function can have surprisingly non-simple implications.

Hash me if you can

Two key assumptions underlying A/B testing are:

Within a given experiment, individuals are assigned to a test group randomly. This could be violated, for instance, if users of a certain version of the app fail to be assigned to a test and are by default always in control.
Assignment across tests is independent

. Knowing a user’s assignment in one test tells you nothing about their assignment in another test.

If both of these assumptions are met, differences from user to user should be averaged out and you can say that any difference in outcomes must be due to the test intervention. In practice, it’s also desirable for a user to remain in the same variant throughout the duration of a test. Subjecting users to constant change is bad for UX and bad for experimentation because there’s often an accommodation period.

A common solution for assigning users to test buckets is via a hashing layer. This approach is deterministic — the same user identifier gets hashed to the same bucket — and it avoids the scaling challenges of using caching to track a user’s assignment.

In its basic form, the user bucketing scheme at OfferUp for a given experiment is:

Take a user ID and combine it with the experiment ID / experiment salt using an XOR.
Run the resulting user + experiment object through a hash.
Finally, mod the hash output to get a fixed range of outcomes, e.g. (x % 10), and then assign a user to a bucket based on the mod-ed value.

It’s a mod mod world

We discovered an issue with user randomization while debugging an unexpected A/B test result. We noticed that users in the control group were making more offers per person than the users in the test group, which was not a metric the test was expected to affect. However, we were also running a A/B test that did target offers per user. On a lark, we checked for correlation between the experiments and found that the assignments were not independent according to a chi-squared test.

Our investigation finally led us to the hashing step itself as the root cause: the hashing function we were using was generating correlated assignment across experiments. In some cases, depending on the particular pair of experiment salts, the correlation was negligible while in other cases the correlation became substantial. In fact, this problem will be present in most non-cryptographic functions. Buried in a 2007 KDD paper from Microsoft is a mention of this issue:

“And if the hash function has characteristics (instances where a perturbation of the key produces a predictable perturbation of the hash code), then correlations may occur between experiments. Few hash functions are sound enough to be used in this technique.

We tested this technique using several popular hash functions and a methodology similar to the one we used on the pseudorandom number generators[…] We found that only the cryptographic hash function MD5 generated no correlations between experiments. SHA256 (another cryptographic hash) came close, requiring a five-way interaction to produce a correlation. The .NET string hashing function failed to pass even a two-way interaction test.”

…fool me twice, SHAme on me

To get an intuition for the mechanics behind correlated hash bucketing, I’ve created a simplified demo of the phenomenon in Python. Our experimentation backend is actually written in Java, but the principles are the same in Python. Both languages’ default hashing functions are optimized for performance and not intended to be cryptographically secure.

Built-In Hash Function

In this demo I’m assuming a simple A/B test scenario where half the incoming users are assigned to bucket A and half are assigned to bucket B. For a set of two experiment salts, I’ve taken a set of 1000 sequential user ids and run them through Python’s built-in hashing function. I then mod-ed the result and bucketed into the appropriate variant.

Below I’ve plotted the mod-ed outputs and buckets for each experiment salt individually. You can clearly see that the outputs are non-random, namely the the buckets of sequential users are related. In practice, however, this isn’t really problematic. Users are still distributed 50/50 across the buckets and the mod-ing ensures that each bucket has a mix of both old and new users.

A good hash is hard to find

A good hash is hard to find

Hash me if you can

It’s a mod mod world

…fool me twice, SHAme on me

Built-In Hash Function

A good hash is hard to find

Ask HN: Are user controls UX designers hard to find?

【jason的專欄】It is easier to find your bugs if you take it slow.Actually taking it slow is faster in the long run

Xcode 內存泄露檢查出現：nil returned from a method that is expected to return a non-null value iOS 解決方案。

Ask HN: What is a good alternative to Confluence?

Ask HN: Is there a good alternative to software registration keys?

How To Find Out If Your Brain Is a Computer

Ask HN: Is it a good idea to do an internship after a year working full time?

Ask HN: Is this a good idea to onboard new employees?

npm ERR! enoent This is related to npm not being able to find a file.解決

Nginx an upstream response is buffered to a temporary file

pod lib lint 報錯 Unable to find a specification for `AMap2DMap` depended upon by `DingtalkPod

解決錯誤:Your ApplicationContext is unlikely to start due to a @ComponentScan of the default package.

selenium WebDriver提示Unable to find a matching set of capabilities解決方法

python2.7運行selenium webdriver api報錯Unable to find a matching set of capabilities

Python Microsoft Visual C++ 10.0 is required (Unable to find vcvarsall.bat)

xcrun: error: unable to find utility "PackageApplication", not a developer tool or in PATH

Where to buy a good Nexiq USB Link 2 at a decent price?

ACM-ICPC 2018 徐州賽區網絡預賽 A. Hard to prepare

GNS3 0.8.6計算idle pc值時出現Failed to find a working Idle PC value. Can't set up hypervisor on 127.0.0.1等問題

A good hash is hard to find

A good hash is hard to find

Hash me if you can

It’s a mod mod world

…fool me twice, SHAme on me

Built-In Hash Function

相關推薦