在ARM汇编中的32位字内有效的按字节旋转

2024-05-14 • 问答

假设我们有一个32位寄存器，它由四个字节R = b0|b1|b2|b3组成。

我想做的是计算R'，使得R' = (b0 <<< x) | (b1 <<< x) | (b2 <<< x) | (b3 <<< x)其中x指任意值，<<<指左字节旋转（即10101110 <<< 2 = 10111010）。

在ARM组装中最有效的方法是什么？

我们可以通过移位来实现旋转并屏蔽掉不需要的位，从而获得预期的效果。这为我们提供了类似于C中的代码：

/* byte-wise right rotate */
unsigned brrot(unsigned R,x)
{
    unsigned mask;

    mask = 0x01010100U - (0x01010101U << R);

    return ((x & mask) >> R | (x & ~mask) << (8-R));
}

翻译为ARM拇指组件，这应该给我们：

ldr r2,=0x01010101      @ load 0x01010101
sub r3,r2,#1           @ compute 0x01010100
sub r2,r3,lsl r0   @ compute mask
and r3,r1,r2           @ compute x & mask
bic r2,r2           @ compute x & ~mask
lsr r3,r0           @ compute (x & mask) >> R
rsb r0,r0,#8           @ compute 8 - R
orr r0,lsl r0   @ compute (x & mask) >> R | (x & ~mask) << (8 - R)

如果已知进位标志在该序列之前是清除的，则可以通过将两个减法替换为来保存一条指令

sbc r2,lsl r0   @ compute mask

遵循以下原则：
（源值在R0中，结果在R0中）

LDR R1,=0xC0C0C0C0  @ mask for 2 MS bits of each byte (pseudo-instruction) 
LDR R2,=0xFCFCFCFC  @ negative mask for 2 LS bits of each byte (pseudo-instruction) 
AND R1,R1,R0        @ R1 holds the values of 2 MS bits of each byte of R0  
MOV R0,R0,LSL #2    @ Shift R0 by 2 bits to the left (2 MS bits are discarded)  
AND R0,R2        @ Zero out 2 LS bits of each byte
ORR R0,LSR #6 @ Move the 'extracted' 2 MS bits of each byte to destination

使用LDR'='pseudo-instruction是因为我很懒，可能存在一种更优化的方式来生成这些掩码...

修改 （为@PeterCordes欢呼）

是的，可以将移位之一嵌入到AND中，我们还可以稍微更改操作顺序并使用不同的掩码，但是结果无论如何都会大致相同。

在“ C”中，看起来像这样：

unsigned byte_rot2l(unsigned x) {
    unsigned result;
    result = ((x<<2) & 0xfcfcfcfc);
    result |= ((x>>6) & 0x03030303);
    return (result);
}

在ARM 32中，这可以表示为：

LDR R2,=0xFCFCFCFC   @ mask for 6 MS bits of each byte (pseudo-instruction) 
LDR R1,=0x03030303   @ mask for 2 LS bits of each byte (pseudo-instruction) 
AND R2,R2,LSL #2  @ R2 := R0 shifted left by 2 bits,zero out the 2 LS bits of each byte (R0 remains unchanged)
AND R0,LSR #6  @ R0 := R0 shifted right by 6 bits,zero out all but the 2 LS bits of each byte
ORR R0,R2         @ "Combine" the bits together
MOV PC,LR            @ Return result in R0

编辑＃2
第二行由汇编程序转换为文字池中32位常量的PC相对负载，可以替换为：

MVN R1,R2

因此无需在文字池中存储0x03030303。但是，无论我尝试使用哪种编译器选项，我都难以理解为什么Godbolt上的gcc无法使用这种优化。有人知道吗？

在ARM汇编中的32位字内有效的按字节旋转

senge11 回答：在ARM汇编中的32位字内有效的按字节旋转

大家都在问