如何加快查找操作的嵌套循环？亮度级别半色调速度

我正在编程halftoning of images for laser-engraving。在给定的设置下，激光只会打开或关闭，因此我可以为它提供1位深度的二进制图像。因此，我将8位深度（0到255）的灰度图像转换为1位深度（0到1）的二进制图像。

下面以两个图像为例。左边是灰度图像。右侧是将每个像素替换为3x3平方的二进制像素的结果。结果看起来类似，因为灰色来自黑色像素的密度。

我目前的尝试是使用嵌套循环访问像素，并将输出图像中的像素替换为字典中的查找值：

import math
import time

import numpy as np

TOnes = [[0,0],[0,1,[1,1],1]]

def process_tones():
    """Converts the tones above to the right shape."""
    tones_dict = dict()

    for t in TOnes:
        brightness = sum(t)
        bitmap_tone = np.reshape(t,(2,2)) * 255
        tones_dict[brightness] = bitmap_tone
    return(tones_dict)

def halftone(gray,tones_dict):
    """Generate a new image where each pixel is replaced by one with the values in tones_dict.
    """

    num_rows = gray.shape[0]
    num_cols = gray.shape[1]
    num_tones = len(tones_dict)
    tone_width = int(math.sqrt(num_tones - 1))

    output = np.zeros((num_rows * tone_width,num_cols * tone_width),dtype = np.uint8)

    # Go through each pixel
    for i in range(num_rows):
        i_output = range(i * tone_width,(i + 1)* tone_width)

        for j in range(num_cols):
            j_output = range(j * tone_width,(j + 1)* tone_width)

            pixel = gray[i,j]
            brightness = int(round((num_tones - 1) * pixel / 255))

            output[np.ix_(i_output,j_output)] = tones_dict[brightness]

    return output

def generate_gray_image(width = 100,height = 100):
    """Generates a random grayscale image.
    """

    return (np.random.rand(width,height) * 256).astype(np.uint8)

gray = generate_gray_image()
tones_dict = process_tones()

start = time.time()
for i in range(10):
    binary = halftone(gray,tones_dict = tones_dict)
duration = time.time() - start
print("Average loop time: " + str(duration))

结果是：

Average loop time: 3.228989839553833

一张100x100的图片平均循环需要3秒钟，与OpenCV的功能相比，这看上去要长得多

我检查了How to speed-up python nested loop?和Looping over pixels in an image，但没有立即看到如何向量化此操作。

如何加快此嵌套的查找操作循环？

诀窍是不要像您那样以低粒度进行迭代，而是将大部分工作分担给优化的numpy函数。

从概念上讲，我们可以将输出图像视为一组较小的图像（称为“通道”），每个图像都保存半色调网格中某个位置的数据。

然后可以通过简单的查找来生成各个通道图像，在Numpy中，我们可以简单地通过indexing带有灰度图像（即LUT[image]）的查找表来完成。

查询表

假设我们以以下方式定义“平铺尺寸”（一个半色调图案的尺寸）和各个色调图块：

TILE_SIZE = (2,2) # Rows,Cols

TONES = np.array(
    [[0,0],[0,1,[1,1],1]],dtype=np.uint8) * 255

我们首先使用np.linspace来计算灰度和色调索引之间的映射。然后，对于每个位置，我们根据音调的定义创建查找表（使用查找技术来做到这一点）。

def generate_LUTs(tones,tile_size):
    num_tones,num_tiles = tones.shape
    tile_rows,tile_cols = tile_size
    assert(num_tiles == (tile_rows * tile_cols))

    # Generate map between grayscale value and tone index
    gray_level = np.linspace(0,(num_tones - 1),256,dtype=np.float32)
    tone_map = np.uint8(np.round(gray_level))

    # Generate lookup tables for each tile
    LUTs = []
    for tile in range(num_tiles):
        LUTs.append(tones[:,tile][tone_map])

    return LUTs

合并渠道

现在，将通道合并到一个完整的输出图像中。

第一步是reshape每个通道图像，使其只有一列。

然后，我们可以使用np.hstack合并所有共享同一半色调图案行的通道图像。

接下来，我们调整结果的形状，以使它们具有与输入图像相同的行数（即现在它们将具有两倍的列数）。

我们再次使用np.hstack合并所有变形图像。

最后，我们调整结果的形状，使其具有正确的行数（根据图块大小），然后完成。

在代码中（适用于任何图块大小的通用）：

def halftone(image,LUTs,tile_size):
    tiles = []
    for tile in range(len(LUTs)):
        tiles.append(LUTs[tile][image])

    image_rows,_ = image.shape
    tile_rows,tile_cols = tile_size

    merged_rows = []
    for row in range(tile_rows):
        row_tiles = tiles[row * tile_cols:(row + 1) * tile_cols]
        merged_row = np.hstack([row_tile.reshape(-1,1) for row_tile in row_tiles])
        merged_rows.append(merged_row.reshape(image_rows,-1))

    return np.hstack(merged_rows).reshape(image_rows * tile_rows,-1)

示例用法：

LUTs = generate_LUTs(TONES,TILE_SIZE)
binary = halftone(gray,TILE_SIZE)

示例输出：

并使用3x3的图块：

使用纯numpy可以非常快地解决这个问题。

首先以矢量化方式计算brightness。
具有亮度的下一个索引tones将gray转换为形状为HxWx2x2的4d数组
使用np.transpose重组数组，以将tones中引入的维度与gray中引入的原始维度进行交织。图像被转换为Hx2xWx2
“展平/合并”垂直尺寸（从gray到H，tone到2），水平尺寸（从gray到W，从{{1}到2）同样。 }。通过重塑为（H * 2）x（W * 2）

请在问题代码下方粘贴以下代码，然后运行它。

tone

在我的机器上，我得到以下结果：

def process_tones2():
    tones = np.array(TONES,dtype='u1')
    size = int(np.sqrt(tones.shape[-1]))
    tones = 255 * tones.reshape(-1,size,size)
    bins = tones.sum(axis=(-2,-1),dtype=int) // size ** 2
    iperm = np.argsort(bins)
    return bins[iperm],tones[iperm]

def halftone_fast(gray,bins,tones):
    height,width = gray.shape
    tone_height,tone_width = tones.shape[-2:]
    brightness = np.round(gray / 255 * (len(tones) - 1)).astype('u1')
    binary4d = tones[brightness]
    binary4d = binary4d.transpose((0,2,3))
    binary = binary4d.reshape(height * tone_height,width * tone_width)
    return binary

bins,tones = process_tones2()
start = time.time()
for i in range(10):
    binary2 = halftone_fast(gray,tones)
duration = time.time() - start
print("Average loop time: " + str(duration))
print("Error:",np.linalg.norm(binary.astype(float) - binary2))

加速约为1000倍。

请注意，Average loop time: 2.3393328189849854 Average loop time: 0.0032405853271484375 Error: 0.0中未使用参数bins。原因是半色调不需要它。该代码仅在halftone_fast()形成亮度级的线性空间（从0开始到所有的1）时才有效。因此，TONES充当brightness排序列表的索引。

如果映射区域不是线性的，则必须使用tones来计算np.digitize(gray,bins)数组中的适当索引。

您的算法似乎包含两个部分：计算每个像素的“亮度”，以及用半色调点替换像素。

首先，我假设输入图像的形状为（ h ， w ）。

grayscale = np.array(...)
h,w = grayscale.shape

亮度级别

计算亮度有两个步骤：

确定每个亮度级别的界限。这可以通过使用np.linspace将范围[0，256）划分为num_tones个相等大小的块来实现。
```
bins = np.linspace(0,num_tones + 1)
# e.g. with 4 tones: [0,64,128,192,256]
```
确定每个像素属于哪个级别。可以使用np.digitize来实现。
```
# (subtract 1 because digitize counts from 1)
levels = np.digitize(grayscale,bins) - 1  # shape (h,w)
```
然后levels[i,j]是grayscale[i,j]的亮度级别（从0到num_tones（包括0）。

半色调

现在您有了每个像素的亮度级别，您可以将它们用作按键来获取其半色调矩阵。为了使此操作尽可能简单，您需要将半色调放置在Numpy数组中，而不是字典中。

tones = np.array(...)  # shape(num_tones,x,y)
x,y = tones.shape[1:]

通过将图像的亮度级别用作tones的{{3}} ¹，可以得到每个像素的半色调矩阵。

halftones = tones[levels]  # shape (h,w,y)
# halftones[i,j] is the halftone for grayscale[i,j]

然后只需按正确的顺序排列元素并展平数组即可。

# Reorder axes so halftone rows are before image columns
ordered = halftones.swapaxes(1,2)  # shape (h,y)

# Make it 2-dimensional
result = ordered.reshape(h * x,w * y)

速度

我写了index array来比较原始代码，我的答案和a script的速度。结果：

Best times
halftone:      0.346237126000000
np_halftone:   0.000565907715000
halftone_fast: 0.000437084295000

两个答案的运行速度都比原始代码快几百倍（我的速度为600，tstanisl的速度为800倍），而tstanisl的速度比我的要好30％。

以这种速度作为交换，我的功能具有tstanisl和原始功能的次要优点：如果您要使用不具有与亮度直接对应的总值的自定义色调，则此算法仍然有效（例如，如果您想要反转半色调中的颜色）。否则，tstanisl的效率更高。

¹ Numpy用户指南链接部分中的最后一个示例实际上与此非常相似，它讨论的是将图像颜色值映射到RGB三元组。

如何加快查找操作的嵌套循环？ 亮度级别半色调速度

ly122653002 回答：如何加快查找操作的嵌套循环？ 亮度级别半色调速度

查询表

合并渠道

亮度级别

半色调

速度

大家都在问

如何加快查找操作的嵌套循环？亮度级别半色调速度

ly122653002 回答：如何加快查找操作的嵌套循环？亮度级别半色调速度