使用浮点键来实现类似于哈希表的数据结构，其中将公差范围内的值合并在一起

2024-05-19 • 问答

我需要一个带有浮点键的关联数据结构，其中具有几乎相等值的键被合并在一起。我正在使用C ++，但是语言并不重要。

基本上我目前的策略是

仅处理单精度浮点数
使用具有自定义键类型的unordered_map
在键类型上将哈希函数定义为

a。给定的浮点数v以两倍的精度将v除以某个公差（例如0.0005），得到k。

b。将k转换为64位整数，产生ki

c。返回ki的std :: hash。

首先，是否有一个标准的命名数据结构可以执行以下操作？是否没有比我一般的方法更好的方法呢？

对于以下实现，我最不喜欢的是对哪个浮点值进行装箱不直观。通过大致了解我想将输入中的哪些值视为相同的值并测试各种容差，我可以解决此问题，但是如果将12.0453添加到容器中，那么将得到12.0453 +/- 0.0005的值将是一个很好的解决方案如果公差参数为0.0005，则认为是相等的，但事实并非如此-我什至不认为在unordered_map上可能会发生这种行为，因为我认为哈希函数将取决于表中的值。

基本上，我的实现是将数字行划分为一维网格，其中每个网格单元的宽度均为epsilon单位，然后将浮点值分配给它们所属的网格单元的从零开始的索引。我的问题是，有没有更好的办法来实现公差也为O（1）的浮点值的关联容器？并且下面的实现是否存在问题？

    template<typename V,int P=4>
    class float_map
    {
    private:
        struct key {
        public:
            long long val;

            static constexpr double epsilon(int digits_of_precision)
            {
                return (digits_of_precision == 1) ? 0.5 : 0.1 * epsilon(digits_of_precision - 1);
            }

            static constexpr double eps = epsilon(P);

            key(float fval) : val(static_cast<long long>( fval / eps))
            {}

            bool operator==(key k) const {
                return val == k.val;
            }
        };

        struct key_hash
        {
            std::size_t operator()(key k) const {
                return std::hash<long long>{}(k.val);
            }
        };

        std::unordered_map<key,V,key_hash> impl_;

    public:
        V& operator[](float f) {
            return impl_[key(f)];
        }

        const V& at(float f) const {
            return impl_.at(key(f));
        }

        bool contains(float f) const {
            return impl_.find(f) != impl_.end();
        }

        double epsilon() const {
            return key::eps;
        }
    };

    int main()
    {
        float_map<std::string> test;

        test[12.0453f] = "yes";

        std::cout << "epsilon = " << test.epsilon() << std::endl;                             // 0.0005

        std::cout << "12.0446f => " << (test.contains(12.0446f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0447f => " << (test.contains(12.0447f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0448f => " << (test.contains(12.0448f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0449f => " << (test.contains(12.0449f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0450f => " << (test.contains(12.0450f) ? "yes" : "no") << std::endl; // yes
        std::cout << "12.0451f => " << (test.contains(12.0451f) ? "yes" : "no") << std::endl; // yes
        std::cout << "12.0452f => " << (test.contains(12.0452f) ? "yes" : "no") << std::endl; // yes
        std::cout << "12.0453f => " << (test.contains(12.0453f) ? "yes" : "no") << std::endl; // yes
        std::cout << "12.0454f => " << (test.contains(12.0454f) ? "yes" : "no") << std::endl; // yes
        std::cout << "12.0455f => " << (test.contains(12.0455f) ? "yes" : "no") << std::endl; // yes
        std::cout << "12.0456f => " << (test.contains(12.0456f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0457f => " << (test.contains(12.0457f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0458f => " << (test.contains(12.0458f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0459f => " << (test.contains(12.0459f) ? "yes" : "no") << std::endl; // no
        std::cout << "12.0460f => " << (test.contains(12.0460f) ? "yes" : "no") << std::endl; // no

    }

delete0012005 回答：使用浮点键来实现类似于哈希表的数据结构，其中将公差范围内的值合并在一起

嗯，也许您可以使用以整数为键的unordered_map，然后通过以下方式确定键：

key = floor（val / precision）;

这是相当透明的，键0将包含0.0到0.0005之间的值（或任何精度）。同样，负数也可以在逻辑上起作用。

如果要使用二维值执行此操作，则可能要研究地质哈希。

做到这一点的最佳方法是使用定点算法。

问题详细信息中的实现有效，但比需要的更加混乱。它实际上被视为epsilon或公差的方法是“箱宽”-划分实数线的网格线之间的一维间距，因此，如果您期望epsilon值像公差一样，您会注意到垃圾桶边缘/网格线附近的违反直觉的行为。

在任何情况下，考虑此问题的更清晰方法是不尝试使用“公差”概念，而是使用“有效数字”概念。仅将n上的n小数点后10位以数字为准，并对其进行参数化。这实际上导致使用定点值作为键，而不是浮点值。在上述实现中，类似于使用0.0001而不是0.0005的epsilon。

但是，现在不仅有理由不只是将固定点值设置为公共类型，而且不将定点值设置为公共类型，并使用该类型作为向用户公开的unordered_map的键，就没有理由了。以前，我们想通过将实现的unordered_map包装在自定义数据结构中来隐藏键类型，因为在这种情况下，键是不透明的，没有直观的含义。在普通的unordered_map中使用定点键具有这样做的好处，因为我们现在为用户提供了实际的unordered_map，因此我们不必为所有标准std :: unordered_map调用实现包装方法。

以下代码：

template<int P=4>
class fixed_point_value
{
    static constexpr double calc_scaling_factor(int digits_of_precision)
    {
        return (digits_of_precision == 1) ? 10.0 : 10.0 * calc_scaling_factor(digits_of_precision - 1);
    }

    static constexpr double scaling_factor = calc_scaling_factor(P);

    template<int P>
    friend struct fixed_point_hash;

public:
    fixed_point_value(float val) : 
        impl_(static_cast<long long>(std::llround(scaling_factor * val)))
    {}

    bool operator==(fixed_point_value<P> fpv) const 
    {
        return impl_ == fpv.impl_;
    }

    float to_float() const
    {
        return static_cast<float>(impl_ / scaling_factor);
    }

private:
    long long impl_;
};

template<int P = 4>
struct fixed_point_hash
{
    std::size_t operator()(fixed_point_value<P> key) const {
        return std::hash<long long>{}(key.impl_);
    }
};

template<typename V,int P = 4>
using fixed_point_table = std::unordered_map<fixed_point_value<P>,V,fixed_point_hash<P>>;

int main()
{
    fixed_point_table<std::string,4> t4;

    t4[12.0453f] = "yes";

    // these will all be "no" except 12.0453f because we have 4 base-10 digits of precision i.e.
    // 4 digits right of the decimal must be an exact match
    std::cout << "precision = 4" << std::endl;
    std::cout << "12.0446f => " << (t4.find(12.0446f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0447f => " << (t4.find(12.0447f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0448f => " << (t4.find(12.0448f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0449f => " << (t4.find(12.0449f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0450f => " << (t4.find(12.0450f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0451f => " << (t4.find(12.0451f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0452f => " << (t4.find(12.0452f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0453f => " << (t4.find(12.0453f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0454f => " << (t4.find(12.0454f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0455f => " << (t4.find(12.0455f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0456f => " << (t4.find(12.0456f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0457f => " << (t4.find(12.0457f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0458f => " << (t4.find(12.0458f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0459f => " << (t4.find(12.0459f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "12.0460f => " << (t4.find(12.0460f) != t4.end() ? "yes" : "no") << std::endl;
    std::cout << "\n";

    fixed_point_table<std::string,3> t3;
    t3[12.0453f] = "yes"; // 12.0453 will round to the fixed point value 12.045.
    std::cout << "precision = 3" << std::endl;
    std::cout << "12.0446f => " << (t3.find(12.0446f) != t3.end() ? "yes" : "no") << std::endl; // rounds to 12.0445 so yes;
    std::cout << "12.0447f => " << (t3.find(12.0447f) != t3.end() ? "yes" : "no") << std::endl; // rounds to 12.0445 so yes;
    std::cout << "12.0448f => " << (t3.find(12.0448f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0449f => " << (t3.find(12.0449f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0450f => " << (t3.find(12.0450f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0451f => " << (t3.find(12.0451f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0452f => " << (t3.find(12.0452f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0453f => " << (t3.find(12.0453f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0454f => " << (t3.find(12.0454f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0455f => " << (t3.find(12.0455f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0456f => " << (t3.find(12.0456f) != t3.end() ? "yes" : "no") << std::endl; // 12.0456f rounds to the 3 digits of precison fixed point value 12.0456 so no
    std::cout << "12.0457f => " << (t3.find(12.0457f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0458f => " << (t3.find(12.0458f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0459f => " << (t3.find(12.0459f) != t3.end() ? "yes" : "no") << std::endl; // '
    std::cout << "12.0460f => " << (t3.find(12.0460f) != t3.end() ? "yes" : "no") << std::endl; // '

}

仅将数据点归类在一起可能无法提供您想要的结果，因为在bin边界的两侧总是会存在非常紧密的点。您需要使用其他方法。

例如：

比方说，您将域划分为epsilon边的正方形。然后，您可以构建一个std::map，将每个数据点分配给一个正方形。并给定任意点P=(x,y)，您可以找到包含S(P)的正方形P。现在，您要做的是查看一个以S(P)为中心的3x3网格中的所有9个正方形。然后，您可以扫描这九个仓，找到与P最接近的数据点。

如果存在某个点，则可以保证此方法在距epsilon (x,y)的距离内找到一个点。

c++data-structures fixed-point floating-point hashtable

本文链接：https://www.f2er.com/3141192.html

使用浮点键来实现类似于哈希表的数据结构，其中将公差范围内的值合并在一起

delete0012005 回答：使用浮点键来实现类似于哈希表的数据结构，其中将公差范围内的值合并在一起

大家都在问