Python JSON序列化排除某些字段

摘要

我有一个Python对象层次结构，我想使用JSON（仅通过https://docs.python.org/3/library/json.html进行序列化，而不使用任何额外的第三方库）。我想排除某些字段/属性/子对象。我发现很难找到关于如何实现此目标的简单答案？

示例

我将有一个这样的派生类实例结束：

class MyItemClass(BaseItemClass):
    self.saveThisProperty = 999
    self.dontSaveThisProperty = "Something"
    self.saveThisObject = ObjectType1()
    self.dontSaveThisObject = ObjectType2()

如果我要序列化为XML，我希望它看起来像

<MyItemClass>
    <saveThisProperty>999</saveThisProperty>
    <saveThisObject>
        ...
    </saveThisObject>
</MyItemClass>

请注意，我只序列化某些属性/子对象，并且不想要序列化整个派生类实例的BaseItemClass

在XML中，我很好。我知道如何按照自己想要的方式输出XML的位，或者存储到最终保存的临时内存文档中，或者通过将单个节点/元素递增输出到流中来输出。我不必序列化所有内容。例如

xmlStream.writeStartElement("MyItemClass")
    xmlStream.writeElementWithValue("saveThisProperty",999)
    xmlStream.writeStartElement("saveThisObject")
        ...
    xmlStream.writeEndElement("saveThisObject")
xmlStream.writeEndElement("MyItemClass")

对于JSON，我不能这样做，可以吗？我是否需要通过仅复制想要的属性/子对象然后进行JSON序列化，来创建一些新的“独立”对象层次结构（不继承自Baseclass）？

我确实看到有json.dump(default = ...)，但是它说：

如果已指定，则默认值应为针对无法序列化的对象调用的函数。它应返回对象的JSON可编码版本

但是，并不是默认情况下原始对象无法序列化Python-> JSON，而是我不想要这种默认的序列化所有行为，我想要我的“选择性”。

我可以为您的情况考虑三种解决方案：

解决方案1： 使用Pykson第三方库并将要序列化的字段定义为pykson字段。

示例：

class MyItemClass(pykson.JsonObject):
    saved_property = pykson.IntegerField()

my_object = MyItemClass(saved_property=1,accept_unknown=True)
my_object.unsaved_property = 2
pykson.Pykson().to_json(my_object)

免责声明：我是pykson库的开发人员。

解决方案2： 第二种解决方案是使用带有自定义默认解串器的包装器类。

class ObjectWrapper:
    def __init__(self,value,should_serialize=False)
        self.value = value
        self.should_serialize = should_serialize

def default_handler(obj):
    if isinstance(obj,ObjectWrapper):
        if obj.should_serialize:
            return obj.value
        else:
            return None
    else:
        raise TypeError

json.dump(default=default_handler)

解决方案3： 这可能不是一个好主意，但是如果您的层次结构较深，则还可以向allc类添加一个函数，该函数将被序列化，并使用此函数获取字典并将字典轻松转换为json。

class MyChildClass:
     def __init__(self,serialized_property,not_serialized_property):
        self.serialized_property = serialized_property
        self.not_serialized_property = not_serialized_property

     def to_dict(self):
        # only add serialized property here
        return {
            "serialized_property": self.serialized_property
        }

class MyParentClass:
    def __init__(self,child_property,some_other_property):
        self.child_property = child_property
        self.some_other_property = some_other_property

    def to_dict(self):
        return {
            'child_property': self.child_property.to_dict(),'some_other_property': self.some_other_property
        }

my_child_object = MyChildClass(serialized_property=1,not_serialized_property=2)
my_parent_object = MyParentClass(child_property=my_child_object,some_other_property='some string here')
json.dumps(my_parent_object.to_dict())

或者您可以使用默认处理程序获得相同的结果：

class MyChildClass:
     def __init__(self,some_other_property):
        self.child_property = child_property
        self.some_other_property = some_other_property

    def to_dict(self):
        return {
            'child_property': self.child_property,'some_other_property': self.some_other_property
        }

def handle_default(obj):
    if isinstance(obj,MyChildClass):
        return obj.to_dict()
    elif isinstance(obj,MyParentClass):
        return obj.to_dict()
    return None

my_child_object = MyChildClass(serialized_property=1,some_other_property='some string here')
json.dumps(my_parent_object,default=handle_default)

我是OP。为了清楚起见，我在这里发布了我最终用于案例的内容。

我在此主题中将@Sina Rezaei的帖子标记为“接受的解决方案”，因为那（他的帖子的最后一节）和@snakechamerb的评论激发了我理解要求的内容。

我的解决方案的轮廓如下：

class ModelScene(QGraphicsScene):

  # Serialize whole scene to JSON into stream
  def json_serialize(self,stream) -> None:
    # Get `json.dump()` to call `ModelScene.json_serialize_dump_obj()` on every object to be serialized
    json.dump(self,stream,indent=4,default=ModelScene.json_serialize_dump_obj)

  # Static method to be called from `json.dump(default=ModelScene.json_serialize_dump_obj)`
  # This method is called on every object to be dumped/serialized
  @staticmethod
  def json_serialize_dump_obj(obj):
    # if object has a `json_dump_obj()` method call that...
    if hasattr(obj,"json_dump_obj"):
      return obj.json_dump_obj()
    # ...else just allow the default JSON serialization
    return obj

  # Return dict object suitable for serialization via JSON.dump()
  # This one is in `ModelScene(QGraphicsScene)` class
  def json_dump_obj(self) -> dict:
    return {
      "_classname_": self.__class__.__name__,"node_data": self.node_data
      }

class CanvasModelData(QAbstractListModel):

  # Return dict object suitable for serialization via JSON.dump()
  # This one is class CanvasModelData(QAbstractListModel)
  def json_dump_obj(self) -> dict:
    _data = {}
    for key,value in self._data.items():
      _data[key] = value
    return {
      "_classname_": self.__class__.__name__,"data_type": self.data_type,"_data": _data
      }

每个“复杂”类都定义一个def json_dump_obj(self) -> dict:方法。
该方法仅返回序列化所需的属性/子对象。
顶级json.dump(self,default=ModelScene.json_serialize_dump_obj)使得通过静态方法ModelScene.json_serialize_dump_obj将访问的每个节点逐步序列化为流。这会调用我的obj.json_dump_obj()（如果有的话），否则会调用基本对象类型的默认JSON序列化。

有趣的是，我遇到了一个和我一样担心的人。从 python中的json.dump（）和json.dumps（）有什么区别？，解决方案https://stackoverflow.com/a/57087055/489865：

内存使用率和速度。

致电jsonstr = json.dumps(mydata)时，它将首先创建一个完整的   将数据复制到内存中，然后file.write(jsonstr)   到磁盘。因此，这是一种更快的方法，但是如果您有   要保存的大量数据。

致电json.dump(mydata,file)时，如果没有's'，则将   未使用，因为数据是按块转储的。但是整个过程是   慢大约2倍。

源：我检查了json.dump()和json.dumps()的源代码，并   还用time.time()和   观看htop中的内存使用情况。

Python JSON序列化排除某些字段

mingdaoke 回答：Python JSON序列化排除某些字段

大家都在问