嵌套JSON总是很难正确处理。
几个月前,我想出了一种使用here中写得很漂亮的 flatten_json_iterative_solution 来提供“通用答案”的方法:每个方法都反复进行解压缩给定json的级别。
然后可以将其简单地转换为 Pandas.Series 然后是 Pandas.DataFrame ,如下所示:
df = pd.Series(flatten_json_iterative_solution(dict(json_))).to_frame().reset_index()
Intermediate Dataframe result
可以轻松地执行某些数据转换以将索引拆分为您要求的列名称:
df[["index","col1","col2","col3","col4"]] = df['index'].apply(lambda x: pd.Series(x.split('_')))
Final result
,
您可以使用此:
def flatten_dict(d):
""" Returns list of lists from given dictionary """
l = []
for k,v in sorted(d.items()):
if isinstance(v,dict):
flatten_v = flatten_dict(v)
for my_l in reversed(flatten_v):
my_l.insert(0,k)
l.extend(flatten_v)
elif isinstance(v,list):
for l_val in v:
l.append([k,l_val])
else:
l.append([k,v])
return l
此函数接收字典(包括值也可以是列表的嵌套)并将其展平为列表。
然后,您可以简单地:
df = pd.DataFrame(flatten_dict(my_dict))
my_dict
是您的JSON对象。
以您的示例为例,运行print(df)
时得到的是:
0 1 2 3 4
0 country1 AdUnit1 floor_price1 feature1 1111
1 country1 AdUnit1 floor_price1 feature2 1112
2 country1 AdUnit1 floor_price2 feature1 1121
3 country1 AdUnit2 floor_price1 feature1 1211
4 country1 AdUnit2 floor_price2 feature1 1221
5 country2 AdUnit1 floor_price1 feature1 2111
6 country2 AdUnit1 floor_price1 feature2 2112
在创建数据框时,您可以命名列和索引
,
您可以尝试这种方法:
from google.cloud import storage
import pandas as pd
storage_client = storage.Client()
bucket = storage_client.get_bucket('test-mvladoi')
blob = bucket.blob('file')
read_output = blob.download_as_string()
data = json.loads(read_output)
data_norm = json_normalize(data,max_level=5)
df = pd.DataFrame(columns=['col1','col2','col3','col4','col5'])
i = 0
for col in b.columns:
a,c,d,e = col.split('.')
df.loc[i] = [a,e,b[col][0]]
i = i + 1
print(df)
,
不是最好的方法,但是它是有效的。另外,您应该修改仅从此awnser
中选取的展平函数
test = {
"country1":{
"AdUnit1":{
"floor_price1":{
"feature1":1111,"feature2":1112
},"floor_price2":{
"feature1":1121
}
},"AdUnit2":{
"floor_price1":{
"feature1":1211
},"floor_price2":{
"feature1":1221
}
}
},"country2":{
"AdUnit1":{
"floor_price1":{
"feature1":2111,"feature2":2112
}
}
}
}
from collections import defaultdict
import pandas as pd
import collections
def flatten(d,parent_key='',sep='_'):
items = []
for k,v in d.items():
new_key = parent_key + sep + k if parent_key else k
if isinstance(v,collections.MutableMapping):
items.extend(flatten(v,new_key,sep=sep).items())
else:
items.append((new_key,v))
return dict(items)
results = defaultdict(list)
colnames = ["col1","col4","col5","col6"]
for key,value in flatten(test).items():
elements = key.split("_")
elements.append(value)
for colname,element in zip(colnames,elements):
results[colname].append(element)
df = pd.DataFrame(results)
print(df)
本文链接:https://www.f2er.com/3166537.html