如何在分布式系统中以相同的方式随机化两个String集合?

我有几个字符串集合(约100k个元素)。它们是从相同的网络源初始化的:

List<String> source = Arrays.asList("cat Tom","dog Woof","bird Carry","fish Silent","mice Dr Black");
List<String> list1 = new ArrayList<>(source);
List<String> list2 = new ArrayList<>(source); 
List<String> list3 = new ArrayList<>(source);

在他们的一生中,他们可以独立进行修改:

list1.add("raccoon George");
list2.set(2,"bird Frank"); // Different bird
list3.remove(2); // No birds!

这些集合位于不同的JVM中,但是具有相同的(共享的)种子:

String seed = UUID.randomUUID().toString();

我需要某种排序/混洗方法(或Comparator)来随机分配这些集合:

magicSort(list1,seed); // JVM #1
magicSort(list2,seed); // JVM #2
magicSort(list3,seed); // JVM #3

要像这样接收一致/可重复的输出:

[mice Dr Black,bird Carry,raccoon George,cat Tom,dog Woof,fish Silent]
[mice Dr Black,bird Frank,fish Silent]

这不起作用:

  • Collections.shuffle()来自this answer-因为它不提供对具有相同前缀的项目的亲和力;
  • Map<String,UUID>中保持随机顺序-因为(a)集合的动态性质和(b)分布式设置。

任何帮助将不胜感激

编辑#1

这不起作用:

  • sha1md5和类似的哈希算法-因为它们不提供对具有相同前缀的项目的相似性-bird Carrybird Frank可以放置得很远。

编辑#2

其他条件:具有相同前缀的项目应分组在一起(相似性),例如所有的鸟都放在老鼠和浣熊之间。

huang118118bing 回答:如何在分布式系统中以相同的方式随机化两个String集合?

非常感谢您在评论中使用哈希提示!

我最终通过伪随机散列(类似于Fisher-Yates随机播放)重新映射了字符顺序。其余代码是从String.compareTo复制粘贴的:

private void magicSort(List<String> list,String seed) {
    list.sort(new ShuffleComparator(seed));
}

public class ShuffleComparator implements Comparator<String> {
    private static final long MAGIC = 0x5DEECE66DL;
    private final String seed;

    public ShuffleComparator(String seed) {
        this.seed = seed;
    }

    @Override
    public int compare(String s1,String s2) {
        int len1 = s1.length();
        int len2 = s2.length();
        int lim = Math.min(len1,len2);

        int seedLen = seed.length();
        long random = seed.hashCode();

        for (int k = 0; k < lim; k++) {
            random = random * MAGIC + seed.charAt(k % seedLen);
            char c1 = s1.charAt(k);
            char c2 = s2.charAt(k);
            if (c1 != c2) {
                return ((random % (c1 + 0xFFFF)) & 0xFFFF) -
                    ((random % (c2 + 0xFFFF)) & 0xFFFF) > 0 ? 1 : -1;
            }
        }
        return (random > 0 ? 1 : -1) * (len1 - len2);
    }
}

结果:

  • 您可以随时对收藏进行随机排序
  • 前缀相同的项目分组在一起(相似性)
  • 唯一的资源(字符串种子)在JVM之间共享
  • 对性能的影响可以忽略不计
  • 零GC

示例:

seed: b85a9119-3a98-4491-b503-fc2cfdc344f3
[raccoon George,bird Carry,mice Dr Black,fish Silent,cat Tom,dog Woof]
[bird Frank,dog Woof]
[mice Dr Black,dog Woof]

seed: d5378421-d0ea-4b17-80e9-1de6a163bf38
[bird Carry,dog Woof,raccoon George,cat Tom]
[bird Frank,cat Tom]
[dog Woof,cat Tom]

seed: a5f94f60-1207-4f83-9723-bbab5e3b8075
[cat Tom,fish Silent]
[cat Tom,bird Frank,fish Silent]

seed: 4b77a61f-c639-42fc-96b2-e1add9f359e9
[bird Carry,fish Silent]
[bird Frank,fish Silent]

seed: 2dd78e21-d4ad-4e8a-b139-5d35fe2b0481
[dog Woof,cat Tom]

编辑#1

对于测试,我还实现了基于哈希的比较器,因此可以在此处发布。万一您不需要亲和力(例如我的情况),但需要纯随机播放:

private void magicSort(List<String> list,String seed) {
    list.sort(new HashComparator(seed,"SHA-1"));
}

public class HashComparator implements Comparator<String> {
    private static final long MAGIC = 0x5DEECE66DL;
    private final ThreadLocal<MessageDigest> messageDigest;
    private final byte[] seed;
    private final int seedHash;

    public HashComparator(String seed,String algorithm) {
        this.seed = seed.getBytes(ISO_8859_1);
        this.seedHash = seed.hashCode();
        this.messageDigest = ThreadLocal.withInitial(() -> {
            try {
                return MessageDigest.getInstance(algorithm);
            } catch (NoSuchAlgorithmException e) {
                throw new RuntimeException(e.getMessage(),e);
            }
        });
    }

    @Override
    public int compare(String s1,String s2) {
        if (s1.equals(s2)) {
            return 0;
        }
        int diff = getSignature(s1).compareTo(getSignature(s2));
        if (diff != 0) {
            return diff;
        }
        long random = seedHash;
        random = random * MAGIC + s1.hashCode();
        random = random * MAGIC + s2.hashCode();
        return ((random & 1) == 0 ? 1 : -1) * s1.compareTo(s2);

    }

    private ByteBuffer getSignature(String s) {
        MessageDigest digest = messageDigest.get();
        digest.reset();
        digest.update(seed);
        for (int i = 0,size = s.length(); i < size; i++) {
            char c = s.charAt(i);
            digest.update((byte) (c >> 8));
            digest.update((byte) c);
        }
        return ByteBuffer.wrap(digest.digest());
    }
}

结果:

  • 您可以随时对收藏进行随机排序
  • 纯随机播放(前缀相同的项目不相关)
  • 唯一的资源(字符串种子)在JVM之间共享

示例:

8d465fde-9310-4b6c-9709-5cc395deb45e
[raccoon George,bird Carry]
[bird Frank,cat Tom]
[mice Dr Black,cat Tom]

6daa90dd-f7e7-4615-a45c-fefb0de10c20
[bird Carry,raccoon George]
[mice Dr Black,dog Woof]

9f30b259-c8d1-41f5-8221-40b17cb04260
[cat Tom,fish Silent]

94b6710f-3232-453d-8009-6d81658b2cca
[dog Woof,raccoon George]
[dog Woof,bird Frank]
[dog Woof,mice Dr Black]
本文链接:https://www.f2er.com/3078803.html

大家都在问