Discussion:
Bug#753809: ginkgocadx and certain dicom files
Gert Wollny
2016-01-28 21:45:36 UTC
Permalink
Hi all,

I think I found the culprit thanks to the file Karsten provided in the
bug report:

The offending tag is

(7fe0,0010) OW

i.e. the pixel data is encoded in big endian, and in gdcm the code is
commented with

gdcm-2.6.1/Source/DataStructureAndEncodingDefinition/gdcmVR.cxx:290
// Optimized version for transforming a read VR from DICOM file
// into a VRType (does not support OB_OW for instance)

so it seems that gdcm doesn't support big endian data, or only
sometimes:

* I tried dicomtonifti from vtk-dicom, and here the file is read.
* I tried gdcmdump and here the file is rejected:
"Failed to read: 2.dcm"

What also surprises me bit is that neither ITK nor ginkodadx fall back
to the dcmtk module, when reading fails with gdcm.

AFAIK Aeskulap uses only dcmtk, so no surprise that it accepts the
file.

Amide via (x)medcon also doesn't read the file, but via dcmtk it works
(obviously).

Now the question is, does ginkgocadx change the endianess when it is
uploading files to it's cloud?

Best,
Gert
Karsten Hilbert
2016-01-29 09:54:07 UTC
Permalink
GDCM does handle big endian transfer syntax correctly. The input file
[release 7e32e2d] Add a fallback for some dataset with invalid group 2 element
1 file changed, 26 insertions(+)
Given the above, GinkgoCADx seems to _make_ them files bogus
during the roundtrip Ginkgo->PACS->Ginkgo.

I'll get in touch with Orthanc again, to see whether Orthanc
does or does not fiddle with the file during up-/download.

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
David Clunie
2016-01-31 14:23:16 UTC
Permalink
I agree with Mathieu Malaterre. This is an invalid DICOM
file, so the bug is in the source system, not the
recipient.

The error is caused by the (0002,0003) data element being
repeated:

- once within the meta information header (within the length
specified by group length (0002,0003))

- once at the beginning of the data set

which can be interpreted as being incorrect in various
ways:

- group 0002 data elements may not occur outside the meta
information header, as Mathieu points out

- data elements may not be repeated

- data elements must be in ascending numerical order

Since the meta information is encoded in explicit VR and
the data set in implicit VR, the two copies of (0002,0003)
are encoded slightly differently, but they do contain the
same value.

I mention this because it explains why toolkits behave
differently with this illegal file (as Mathieu and Jörg
have discussed in comp.protocols.dicom):

- those that determine the end of the meta information
header based on the (mandatory) group length of group
0002 may switch transfer syntax from explicit to implicit
before the offending tag and read it, and then may or
may not complain about its presence being illegal
(duplicate or out of order), or may just silently
ignore it

- those that use the change of group number from 0002 to
something higher to detect the end of the meta header may
not read the offending implicit VR encoded extra out of
order group 0002 data element correctly if they try to
parse it as explicit VR, and get out of sync

The Implementation Version Name in the meta information header
says "OFFIS_DCMTK_360", but dcmtk doesn't make this kind
of error, as far as I know, so Karsten, it would be good
to know what system actually encoded this bad DICOM file.

David

% andump debian_753809_badmetaelement.dcm
(0x0002,0x0000) UL File Meta Information Group Length VR=<UL>
VL=<0x0004> [0x000000d8]
(0x0002,0x0001) OB File Meta Information Version VR=<OB> VL=<0x0002>
[0x00,0x01]
(0x0002,0x0002) UI Media Storage SOP Class UID VR=<UI> VL=<0x001a>
<1.2.840.10008.5.1.4.1.1.1>
(0x0002,0x0003) UI Media Storage SOP Instance UID VR=<UI>
VL=<0x003a> <1.2.276.0.7230010.3.1.4.2831160242.3411.1400154379.583031>
(0x0002,0x0010) UI Transfer Syntax UID VR=<UI> VL=<0x0012>
<1.2.840.10008.1.2>
(0x0002,0x0012) UI Implementation Class UID VR=<UI> VL=<0x001c>
<1.2.276.0.7230010.3.0.3.6.0>
(0x0002,0x0013) SH Implementation Version Name VR=<SH> VL=<0x0010>
<OFFIS_DCMTK_360 >
(0x0002,0x0016) AE Source Application Entity Title VR=<AE>
VL=<0x0008> <GNUMEDLH>
(0x0002,0x0003) UI Media Storage SOP Instance UID VR=<UN>
VL=<0x003a>
[0x31,0x2e,0x32,0x2e,0x32,0x37,0x36,0x2e,0x30,0x2e,0x37,0x32,0x33,0x30,0x30,0x31,0x30,0x2e,0x33,0x2e,0x31,0x2e,0x34,0x2e,0x32,0x38,0x33,0x31,0x31,0x36,0x30,0x32,0x34,0x32,0x2e,0x33,0x34,0x31,0x31,0x2e,0x31,0x34,0x30,0x30,0x31,0x35,0x34,0x33,0x37,0x39,0x2e,0x35,0x38,0x33,0x30,0x33,0x31,0x00]

(0x0008,0x0000) UL Group Length VR=<UL> VL=<0x0004> [0x000001d2]
(0x0008,0x0005) CS Specific Character Set VR=<CS> VL=<0x000a>
<ISO_IR 100>
...

% dcdump -ignoreoutofordertags debian_753809_badmetaelement.dcm
Warning - Tags out of order - trailing garbage, wrong transfer syntax,
or not valid DICOM
(0x0002,0x0003) UI Media Storage SOP Instance UID Error - Tag read
failed - Implicit VR encoding even though supposed to be explicit
Warning - Bad group length - Group 0x2 specified as 0xd8 actually 0x11a
Error - Dicom dataset read failed
(0x0002,0x0000) UL File Meta Information Group Length VR=<UL>
VL=<0x0004> [0x000000d8]
(0x0002,0x0001) OB File Meta Information Version VR=<OB> VL=<0x0002>
[0x00,0x01]
(0x0002,0x0002) UI Media Storage SOP Class UID VR=<UI> VL=<0x001a>
<1.2.840.10008.5.1.4.1.1.1>
(0x0002,0x0003) UI Media Storage SOP Instance UID VR=<UI>
VL=<0x003a> <1.2.276.0.7230010.3.1.4.2831160242.3411.1400154379.583031>
(0x0002,0x0003) UI Media Storage SOP Instance UID VR=<UI>
VL=<0x003a> <1.2.276.0.7230010.3.1.4.2831160242.3411.1400154379.583031>
(0x0002,0x0010) UI Transfer Syntax UID VR=<UI> VL=<0x0012>
<1.2.840.10008.1.2>
(0x0002,0x0012) UI Implementation Class UID VR=<UI> VL=<0x001c>
<1.2.276.0.7230010.3.0.3.6.0>
(0x0002,0x0013) SH Implementation Version Name VR=<SH> VL=<0x0010>
<OFFIS_DCMTK_360 >
(0x0002,0x0016) AE Source Application Entity Title VR=<AE>
VL=<0x0008> <GNUMEDLH>
(0x0008,0x0000) UL Group Length VR=<UL> VL=<0x0004> [0x000001d2]
(0x0008,0x0005) CS Specific Character Set VR=<CS> VL=<0x000a>
<ISO_IR 100>
...

% java -cp pixelmed.jar com.pixelmed.dicom.AttributeList
debian_753809_badmetaelement.dcm
Error: Illegal duplicate tag in dataset - (0x0002,0x0003) - replacing
previous occurence
[main] INFO com.pixelmed.dicom.AttributeList Dumping ...
(0x0002,0x0000) FileMetaInformationGroupLength VR=<UL> VL=<0x4> [0xd8]
(0x0002,0x0001) FileMetaInformationVersion VR=<OB> VL=<0x2> []
(0x0002,0x0002) MediaStorageSOPClassUID VR=<UI> VL=<0x1a>
<1.2.840.10008.5.1.4.1.1.1>
(0x0002,0x0003) MediaStorageSOPInstanceUID VR=<UI> VL=<0x3a>
<1.2.276.0.7230010.3.1.4.2831160242.3411.1400154379.583031>
(0x0002,0x0010) TransferSyntaxUID VR=<UI> VL=<0x12> <1.2.840.10008.1.2>
(0x0002,0x0012) ImplementationClassUID VR=<UI> VL=<0x1c>
<1.2.276.0.7230010.3.0.3.6.0>
(0x0002,0x0013) ImplementationVersionName VR=<SH> VL=<0x10>
<OFFIS_DCMTK_360 >
(0x0002,0x0016) SourceApplicationEntityTitle VR=<AE> VL=<0x8> <GNUMEDLH>
(0x0008,0x0000) VR=<UN> VL=<0x4> [***@12edcd21
(0x0008,0x0005) SpecificCharacterSet VR=<CS> VL=<0xa> <ISO_IR 100>
...
Karsten Hilbert
2016-01-31 18:15:18 UTC
Permalink
Post by David Clunie
I agree with Mathieu Malaterre. This is an invalid DICOM
file, so the bug is in the source system, not the
recipient.
Well, that may be true (I can't verify).

However, that fact doesn't have anything to do with the bug I reported.

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Karsten Hilbert
2016-01-31 18:16:41 UTC
Permalink
Post by Karsten Hilbert
Post by David Clunie
I agree with Mathieu Malaterre. This is an invalid DICOM
file, so the bug is in the source system, not the
recipient.
Well, that may be true (I can't verify).
However, that fact doesn't have anything to do with the bug I reported.
HOWEVER, thank you for taking the time to add your insight
into this.

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Karsten Hilbert
2016-01-31 18:42:26 UTC
Permalink
Post by David Clunie
The Implementation Version Name in the meta information header
says "OFFIS_DCMTK_360", but dcmtk doesn't make this kind
of error, as far as I know, so Karsten, it would be good
to know what system actually encoded this bad DICOM file.
That doesn't matter because:

- initially, GinkgoCADx (and hence GDCM ?) reads the file
just fine from local storage (say, CD-ROM)
- it then sends the file to a PACS
- it then retrieves the file from the same PACS
- it then does not read the very same file anymore

I am CCing the PACS programmer such that he may reconfirm
that Orthanc doesn't modify DICOM files it receives for
storage via DICOM - Sebastien ?

The bug is:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=753809

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Karsten Hilbert
2016-01-31 18:45:19 UTC
Permalink
Post by Karsten Hilbert
Post by David Clunie
The Implementation Version Name in the meta information header
says "OFFIS_DCMTK_360", but dcmtk doesn't make this kind
of error, as far as I know, so Karsten, it would be good
to know what system actually encoded this bad DICOM file.
- initially, GinkgoCADx (and hence GDCM ?) reads the file
just fine from local storage (say, CD-ROM)
- it then sends the file to a PACS
- it then retrieves the file from the same PACS
- it then does not read the very same file anymore
And, may I add, this behaviour happens with DICOM files from
various (commercial) sources.

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Sébastien Jodogne
2016-01-31 20:10:48 UTC
Permalink
Orthanc will not modify the file by itself, except if the input file is
corrupted, which could lead to undefined behavior. Garbage in, garbage out.

Regards,
Sébastien-
Post by Karsten Hilbert
Post by Karsten Hilbert
Post by David Clunie
The Implementation Version Name in the meta information header
says "OFFIS_DCMTK_360", but dcmtk doesn't make this kind
of error, as far as I know, so Karsten, it would be good
to know what system actually encoded this bad DICOM file.
- initially, GinkgoCADx (and hence GDCM ?) reads the file
just fine from local storage (say, CD-ROM)
- it then sends the file to a PACS
- it then retrieves the file from the same PACS
- it then does not read the very same file anymore
And, may I add, this behaviour happens with DICOM files from
various (commercial) sources.
Karsten Hilbert
2016-01-31 20:37:30 UTC
Permalink
Post by Sébastien Jodogne
Orthanc will not modify the file by itself, except if the input file is
corrupted
I hadn't been made aware of this "except" so far. This means
more testing will need to be done because eventually Orthanc
_does_ modify input files sometimes. Sigh.

I will see whether I can manage to find out more.
Post by Sébastien Jodogne
which could lead to undefined behavior. Garbage in, garbage out.
Sure, but so far I'd have assumed "garbage in, SAME garbage out".

Note that I fully understand the desire to accept and store
the garbage (as [part of] the saying goes "be liberal in what
you accept"). However, it'd be helpful if there was either
documentation that Orthanc does indeed sometimes modify files
on its own or else if there was even a configuration option
for skip improworsening of DICOM files handed out. Orthanc
would be the tool of the day if it offered a REST field /
DICOM tag / whatever saying "Orthanc thinks this file is
broken because ..." and the web frontend would offer buttons
"display original", "display fixed", "fix permanently" ;-)

Yeah, I know, making suggestions is cheap... :-)

Thanks for everyone's work so far,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Sébastien Jodogne
2016-01-31 21:06:23 UTC
Permalink
Post by Karsten Hilbert
Post by Sébastien Jodogne
Orthanc will not modify the file by itself, except if the input file is
corrupted
I hadn't been made aware of this "except" so far. This means
more testing will need to be done because eventually Orthanc
_does_ modify input files sometimes. Sigh.
Well, Orthanc does not modify the *content* of the DICOM file, BUT
syntactic variants such as the padding or the explicit/implicit length
encoding might change.
Post by Karsten Hilbert
which could lead to undefined behavior. Garbage in, garbage out.
Sure, but so far I'd have assumed "garbage in, SAME garbage out".
Unfortunately, you cannot make this assumption, as the decoder might behave
in some undefined fashion when faced with a corrupted DICOM file.
Post by Karsten Hilbert
Note that I fully understand the desire to accept and store
the garbage (as [part of] the saying goes "be liberal in what
you accept"). However, it'd be helpful if there was either
documentation that Orthanc does indeed sometimes modify files
on its own or else if there was even a configuration option
for skip improworsening of DICOM files handed out. Orthanc
would be the tool of the day if it offered a REST field /
DICOM tag / whatever saying "Orthanc thinks this file is
broken because ..." and the web frontend would offer buttons
"display original", "display fixed", "fix permanently" ;-)
Yeah, I know, making suggestions is cheap... :-)
The philosophy of Orthanc is indeed to accept any DICOM file, as soon as it
can be parsed and it contains the 4 mandatory tags PatientID +
StudyInstanceUID + SeriesInstanceUID + SOPInstanceUID. This is the most
liberal approach in the DICOM (I often talk about "least common
denominator").

However, a third-party C/C++ plugin for Orthanc could be implemented to
check the validity of each incoming DICOM file thanks to David Clunie's
dicom3tools (more precisely by calling "dciodvfy"). This would allow to
track corrupted files in the way you suggest. Incoming files could be
tagged and/or send to a kind of purgatory.

Check out the "OrthancPluginRegisterOnStoredInstanceCallback()" function
for this purpose:
https://orthanc.chu.ulg.ac.be/sdk/group__Callbacks.html#ga1af7c8c9877aaf670208bfc53164b9fb

As always, because I have already way too much work but few development
support, I unfortunately cannot implement such a plugin by myself in the
next few months...
David Clunie
2016-02-01 13:14:57 UTC
Permalink
Hi Karsten
Post by Karsten Hilbert
Sure, but so far I'd have assumed "garbage in, SAME garbage out".
There is no reason to assume this.

If a DICOM storage SCP receives an invalidly constructed data
set, then it has no obligation to be able to cope with it,
though many implementations try very hard to clean up bad
stuff.

Even a valid data set is likely to be modified in various
minor or significant ways (coercion of identifiers, cleaning
up of padding, removal of retired group lengths, change in
fixed versus delimited sequence item encoding, and even
removal of private or non-standard or non-mandatory
attributes depending on the Storage SCP "level" that the
system supports). See, for example:

http://dicom.nema.org/medical/dicom/current/output/chtml/part04/sect_B.4.html#sect_B.4.1
Post by Karsten Hilbert
Post by David Clunie
Post by Sébastien Jodogne
Unfortunately, you cannot make this assumption, as the decoder might
behave
Post by Sébastien Jodogne
in some undefined fashion when faced with a corrupted DICOM file.
I would not have thought that decoding is needed when
retrieving files for storage over DICOM but that only shows
my igorance regarding that protocol :-)
Indeed, DICOM store piggybacks the DICOM instance together with the C-Store
command itself. This unfortunately opens the path to many compatibilty
problems related to non-conforming, proprietary implementations.
Note however that, in the context of Orthanc, if the DICOM file is sent
through the REST API, then this decoding/re-encoding chain does not take
place (accordingly to your intuition). This would also be the case if
DICOMweb STOW-RS is used.
A simple example of the importance of the correct encoding
of the data set and the need to be able to decode it, is
indexing ... if the data set cannot be parsed, then how can
one extract the patient and study and series identifiers to
use in the database so as to respond correctly to a query?

Or to put it another way, neither the data set that follows
the meta information header in a DICOM file, nor the data
set that follows a C-STORE command message on the network
(or an HTTP GET or POST), is an opaque bunch of bytes, but
has to be a correctly formed data set (just like an XML file
needs to be correctly formed to be parseable even if it
doesn't match the schema or DTD), if one is going to
extract anything from it.

Though many implementations try their best to cope with
really crappy data sets by fixing them, sometimes the
heuristics to fix one data set defect conflict with those
to fix other data set defects, so this gets really tricky.

Whether the data set arrives via a C-STORE or STOW makes
no difference to whether or not data set integrity and
compliance are required, at least beyond simply storing
the bytes (the DIMSE protocol is no more or less dependent
on the data set bytes after the command message than the
HTTP POST). Or more precisely, the command and data set
DIMSE messages are completely separated in terms of the way
they are packaged during encapsulation as Presentation
Data Values in Presentation Data Units, and the PDV
Message Header signals whether the fragment is a Command
or Data type; see:


http://dicom.nema.org/medical/dicom/current/output/chtml/part08/chapter_E.html

Unlike with STOW (or WADO-RS), one does not have to separate
the PS3.10 meta information header from the data set, because
there is no meta information header sent over the wire with
the DIMSE protocol.
Post by Karsten Hilbert
Post by David Clunie
I agree with Mathieu Malaterre. This is an invalid DICOM
file, so the bug is in the source system, not the
recipient.
Well, that may be true (I can't verify).
However, that fact doesn't have anything to do with the bug I reported.
Now hopefully you see why it does. What you reported does
not sound like a bug in the implementation, it is a "feature
request" to cope with a particular pattern of invalid input,
for which a fix might not necessarily lead to coping with
other patterns of invalid input.

Beyond that one bad file, a feature request to "store and index
any garbage that even vaguely resembles a DICOM instance" is
a pretty ambitious demand.

David
Karsten Hilbert
2016-02-01 13:26:54 UTC
Permalink
Hi David,

thanks for the clarification.
Post by David Clunie
Post by Karsten Hilbert
Sure, but so far I'd have assumed "garbage in, SAME garbage out".
There is no reason to assume this.
Indeed. I should have been more precise:

IF garbage in AND if garbage ACCEPTED, then SAME garbage
OUT if NOT UNAMBIGOUSLY fixable

Sounds right ? :)
Post by David Clunie
A simple example of the importance of the correct encoding
of the data set and the need to be able to decode it, is
indexing ... if the data set cannot be parsed, then how can
one extract the patient and study and series identifiers to
use in the database so as to respond correctly to a query?
Dang, there goes my logic :-)
Post by David Clunie
Post by Karsten Hilbert
However, that fact doesn't have anything to do with the bug I reported.
Now hopefully you see why it does.
I much better appreciate what is involved thanks to your
explanations.
Post by David Clunie
What you reported does not sound like a bug in the
implementation, it is a "feature
request" to cope with a particular pattern of invalid input,
for which a fix might not necessarily lead to coping with
other patterns of invalid input.
Beyond that one bad file, a feature request to "store and index
any garbage that even vaguely resembles a DICOM instance" is
a pretty ambitious demand.
What struck me as odd: It used to work ! Suddenly Ginkgo
CADx stopped redisplaying roundtripped studies from the same
original modality.

So, previously:

- get from modality onto CDROM
- read CDROM
- display fine
- send to orthanc
- retrieve from orthanc
- display fine

Now:

- get from very same modality onto CDROM
- read CDROM
- display fine
- send to orthanc
- retrieve from orthanc
- display breaks

However, I now see that "very same modality" might stand on
shaky ground as while it is certainly the same X-ray machine
the software handling that modality may have (and has)
changed at least in version.

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Sébastien Jodogne
2016-02-01 13:34:54 UTC
Permalink
Dear David,

Note however that, in the context of Orthanc, if the DICOM file is sent
Post by David Clunie
Post by Karsten Hilbert
through the REST API, then this decoding/re-encoding chain does not take
place (accordingly to your intuition). This would also be the case if
DICOMweb STOW-RS is used. [...]
Whether the data set arrives via a C-STORE or STOW makes
no difference to whether or not data set integrity and
compliance are required, at least beyond simply storing
the bytes (the DIMSE protocol is no more or less dependent
on the data set bytes after the command message than the
HTTP POST). [...]
I fully agree with you, but just for clarity sake: What I wrote above is
the description of an implementation choice of Orthanc.

If Orthanc receives a DICOM instance through REST or DICOMweb, the instance
will be stored as such on the hard disk (i.e. byte-to-byte identical). If
Orthanc receives an instance through a DIMSE message, the file might not be
byte-to-byte identical to the source instance (because things such as
repadding might occur). In both cases, if the DCMTK parser is not able to
decode the DICOM instance, or if one of the DICOM tags PatientID,
StudyInstanceUID, SeriesInstanceUID or SOPInstanceUID is missing, the
instance will not be stored by Orthanc.

This is how Orthanc is designed, as an answer to Karsten's question...
Obviously, I do not pretend this is the best convention, and other
DICOM/DICOMweb servers might not follow the same conventions as Orthanc. :)

The sense of my answer was also to say that a plugin could be developed to
turn Orthanc into a strict server that would only accept 100% conformant
DICOM instances.

Regards,
Sébastien-

Loading...