Decompression problems with .bz2 files in pre-release

Hi,

we’ve tried downloading files from the new release, but some files have errors, e.g.,
downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09.01/mappingbased-objects_lang%3Den.ttl.bz2

This one, for example, cannot be unpacked with 7Zip (reports an error after 1757k). Other files (e.g., downloads.dbpedia.org/repo/lts/mappings/specific-mappingbased-properties/2019.09.01/specific-mappingbased-properties_lang%3Den.ttl.bz2) are fine though.

Could you please have a look and fix this?

Thansk,
Heiko

Hi @heikopaulheim,
I could not confirm this:

 wget http://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09.01/mappingbased-objects_lang%3den.ttl.bz2
--2020-02-04 07:31:54--  http://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09.01/mappingbased-objects_lang%3den.ttl.bz2
Resolving downloads.dbpedia.org (downloads.dbpedia.org)... 139.18.16.66
Connecting to downloads.dbpedia.org (downloads.dbpedia.org)|139.18.16.66|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 119919822 (114M) [application/octet-stream]
Saving to: ‘mappingbased-objects_lang=en.ttl.bz2’

mappingbased-objects_lang=en.ttl.bz2                 100%[======================================================================================================================>] 114,36M  2,69MB/s    in 46s     

2020-02-04 07:32:40 (2,51 MB/s) - ‘mappingbased-objects_lang=en.ttl.bz2’ saved [119919822/119919822]

shellmann@bossbrainz:/tmp$ bzip2 -t mappingbased-objects_lang\=en.ttl.bz2 
shellmann@bossbrainz:/tmp$ 
shellmann@bossbrainz:/tmp$ lbzip2 -t mappingbased-objects_lang\=en.ttl.bz2 
shellmann@bossbrainz:/tmp$ 

We normally use bzip2, lbzip2 and Apache Compress as these are the most interoperable. pbzip2 has some issues. 7Zip might as well.

Another thing. Bzip2 is compressed into partitions, sometimes you need to set a flag to the decompressor so it extracts all partitions:
https://commons.apache.org/proper/commons-compress/apidocs/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.html#BZip2CompressorInputStream-java.io.InputStream-boolean-

    decompressConcatenated - if true, decompress until the end of the input; if false, stop after the first .bz2 stream and leave the input position to point to the next byte after the .bz2 stream

this is counterintuitive, please check 7Zip.

Thanks, it seems to be a 7Zip issue indeed. I’ve searched the 7Zip configuration options, but there seems to be no suitable switch.

However, I’ve tried the same file with WinRar and it’s fine.

ok, my suggestion her is clearly to install Linux/Ubuntu or a proper terminal, so you can do things like:

curl http://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09.01/mappingbased-objects_lang%3den.ttl.bz2 | bzcat | cut -f1 -d '>' | sort LC_ALL=C | uniq -c | sort LC_ALL=C -nr

which gives you the outdegree of subjects for example. Windows tools might slow you down long term.