从二进制数组中逐行选择百分比吗？

2024-05-19 • 问答

我有一系列二进制文件...我想要的是能够从每一行中选择特定百分比的二进制文件... f.e.假设每行有100个，我想从第一行中随机取回20％，从第二行中取回10％，第三名40％，第四名30％（总计100％）。

0| 00000000001000000010000000000000000000001000000100000000000000000000000000000001 ... 
1| 00000000000000010000000000001000000000000100000000000000000000000000000000000000 ... 
2| 00000000000000000000000000000010010000000000000000000000000000010000100000000000 ... 
3| 01000000000000100000000000000000000000001000100000000000000010000000000000000000 ...

这很容易，只需对每一行执行random.choice（one_idxs，％）。问题是目标的位数也必须是100。即如果某些位重叠并且随机选择将其选中，则总数将不同于100位。

在每行上加上它应该尝试选择至少以前没有选择的位！

任何想法

例如我在简单情况下使用的代码（（不考虑所选索引是否在行中重复，仅在一行中重复）：

for every row :
   ones_count = 100
   bits_cnt = int(ones_count * probs[i])
   idxs = array.get_row(i).one_idxs()
   selected = np.random.choice(idxs,size=bits_cnt,replace=False)

我只需要选择Ones ..那就是为什么我使用索引

使用字符串列表作为便利，而不是使用位数组并获取4个样本...

In [39]: data = ['10000101',...:         '11110000',...:         '00011000']                                                    

In [40]: idxs = random.sample(range(len(data[0])),4)                           

In [41]: # 20% row 1,30% row 2,50% row 3                                      

In [42]: row_selections = random.choices(range(len(data)),[0.2,0.3,0.5],k=len(idxs))                                                               

In [43]: idxs                                                                   
Out[43]: [7,3,1,4]

In [44]: row_selections                                                         
Out[44]: [0,2,1]

In [45]: picks = [ data[r][c] for (r,c) in zip(row_selections,idxs)]          

In [46]: picks                                                                  
Out[46]: ['1','1','0','0']

好的，根据您的评论，这应该更好地作为一个示例，说明如何仅按比例从每个列表/数组中选择一个：

import random
a1= '10001010111110001101010101'
a2= '00101010001011010010100010'
a1 = [int(t) for t in a1]
a2 = [int(t) for t in a2]
a1_one_locations= [idx for idx,v in enumerate(a1) if v==1]
a2_one_locations= [idx for idx,v in enumerate(a2) if v==1]

# lists of indices where 1 exists in each list...
print(a1_one_locations)
print(a2_one_locations)

n_samples = 6 # total desired

# 40% from a1,remainder from a2
a1_samples = int(n_samples * 0.4)
a2_samples = n_samples - a1_samples
a1_picks = random.sample(a1_one_locations,a1_samples)
a2_picks = random.sample(a2_one_locations,a2_samples)

# print results
print('indices from a1: ',a1_picks)
print('indices from a2: ',a2_picks)

输出：

[0,4,6,8,9,10,11,12,16,17,19,21,23,25]
[2,13,15,18,20,24]
indices from a1:  [6,21]
indices from a2:  [10,20]

从二进制数组中逐行选择百分比吗？

wahaha1021 回答：从二进制数组中逐行选择百分比吗？

大家都在问