Skip to content

Support for reading and writing SPDX 3.0.1 documents using spdx-python-model#898

Open
bd-jrobson wants to merge 1 commit into
spdx:mainfrom
bd-jrobson:spdx3-features
Open

Support for reading and writing SPDX 3.0.1 documents using spdx-python-model#898
bd-jrobson wants to merge 1 commit into
spdx:mainfrom
bd-jrobson:spdx3-features

Conversation

@bd-jrobson

Copy link
Copy Markdown

This PR adds reading and writing SPDX 3.0.1 documents to this library by using the spdx-python-model.

There are a lot of changes here, most of them are in the src/spdx_tools/spdx3/model directory. These are to make the model match SPDX 3.0.1, either by deleting classes that do not exist in it, adding new ones, or adjusting fields e.g. creation_info: Optional[CreationInfo] to creation_info: CreationInfo since CreationInfo is mandatory or homepage to home_page so that snake case to camel case conversion will work when converting between python naming and SPDX 3 naming.

Much like #897, this uses the spdx-python-model for serialization and deserialization. It's quite a coincidence that we ended up creating MRs within a day or so of each other! That SpdxObjectSet is a good idea!

The code for converting between the models in this library and spdx-python-model is in the src/spdx_tools/spdx3/binding directory.

shacl_to_spdx3_converter.py has a convert_to_payload method that accepts a v3_0_1.SHACLObjectSet and converts it into a Payload.

The Spdx3ToSHACLConverter class in spdx3_to_shacl_converter.py has a convert method that accepts a Payload and converts it into a v3_0_1.SHACLObjectSet

The helpers.py file contains code used by both classes.

How to use

Reading a file

To read a file, use jsonld_parser.parse_from_file to either parse from a string, or a file path. To use a sample of test code...

...
from spdx_tools.spdx3.parser.jsonld.jsonld_parser import parse_from_file
...

def test_parse_from_file():
    file_path = os.path.join(os.path.dirname(__file__), "../../data/spdxV3-example.json")
    payload = parse_from_file(file_path)

This produces a Payload which already existed in the codebase.

Writing a file

To write a SPDX 3 file in the jsonld format, which is the only one supported, use json_ld_writer.write_payload This is the same way it would be done as the current code on main. Another sample of test code..

...
from spdx_tools.spdx3.writer.json_ld.json_ld_writer import write_payload
...

def test_json_writer():
    spdx2_document: Spdx2_Document = document_fixture()
    payload: Payload = bump_spdx_document(spdx2_document)

    # this currently generates an actual file to look at, this should be changed to a temp file later
    with resources.as_file(resources.files("tests.spdx3.writer.json_ld").joinpath("SPDX3_jsonld_test")) as output_file:
        write_payload(payload, str(output_file))

This code generates a Payload, and uses write_payload to write it to a file.

Creating a Payload via code

This is done the much the same way as it would be on main right now. Create your objects using the classes in src/spdx_tools/spdx3/model and add them to a Payload, with elements referencing other elements by SPDX ID.

agent_id = "urn:agent.com/id"
creation_info = CreationInfo(
    Version("3.0.1"), datetime.now().astimezone(timezone.utc), [agent_id]
)
agent = Agent(agent_id, creation_info)

document = SpdxDocument(document_id, "Document Name", [], [], creation_info=creation_info)

payload = Payload()
payload.add_element(agent)
payload.add_element(document)

Bump and console writer

The existing "bump" classes and tests were updated to match the new SPDX 3 models.

The console writer code in src/spdx_tools/spdx/writer/console was barely updated. A few writers that matched classes that no longer exist were removed, as were references to now removed fields, but new ones were not created to include the new classes, so these are incomplete.

SPDX 3 test files

The three new SPDX 3 test documents in tests/spdx3/data were sourced from:

With the last one being modified slightly. These are all CC0-1.0 according to the repository's README.

SPDX 2.3 RDF change

The change to src/spdx_tools/spdx/writer/rdf/writer_utils.py is because of what appears to be a bug with the Java SPDX library. This could be split off into its own MR, or directed towards the Java SPDX maintainers.

This library was serializing a download location with a value of noassertion as <spdx:downloadLocation rdf:resource="http://spdx.org/rdf/terms#noassertion"/>, which according an example in the the spec here at 7.7.3 https://spdx.github.io/spdx-spec/v2.3/package-information/#773-examples and this library, is a valid representation. This library allows a downloadLocation to have its noassertion/none value in either the rdf:resource attribute as a URI as above, or as the literal value of an element. i.e. <spdx:downloadLocation>NOASSERTION</spdx:downloadLocation>

The problem is that the Java SPDX library only accepts <spdx:downloadLocation>NOASSERTION</spdx:downloadLocation>, so I made this library serialize it in that manner.

This can be verified by using this library (not on this branch) to convert a SPDX 2.3 file containing download locations with noassertion values to JSON format, then RDF, and then having the SPDX Java library/the SPDX online tool validate it. It will fail because the conversion to RDF will be using the rdf:resource formatting. Doing the same process with this branch should result in no errors when validating.

Signed-off-by: Johnathan Robson <jrobson@blackduck.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant