Hi,
we’ve tried downloading files from the new release, but some files have errors, e.g.,
downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09.01/mappingbased-objects_lang%3Den.ttl.bz2
This one, for example, cannot be unpacked with 7Zip (reports an error after 1757k). Other files (e.g., downloads.dbpedia.org/repo/lts/mappings/specific-mappingbased-properties/2019.09.01/specific-mappingbased-properties_lang%3Den.ttl.bz2 ) are fine though.
Could you please have a look and fix this?
Thansk,
Heiko
kurzum
February 4, 2020, 6:35am
2
Hi @heikopaulheim ,
I could not confirm this:
wget http://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09.01/mappingbased-objects_lang%3den.ttl.bz2
--2020-02-04 07:31:54-- http://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09.01/mappingbased-objects_lang%3den.ttl.bz2
Resolving downloads.dbpedia.org (downloads.dbpedia.org)... 139.18.16.66
Connecting to downloads.dbpedia.org (downloads.dbpedia.org)|139.18.16.66|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 119919822 (114M) [application/octet-stream]
Saving to: ‘mappingbased-objects_lang=en.ttl.bz2’
mappingbased-objects_lang=en.ttl.bz2 100%[======================================================================================================================>] 114,36M 2,69MB/s in 46s
2020-02-04 07:32:40 (2,51 MB/s) - ‘mappingbased-objects_lang=en.ttl.bz2’ saved [119919822/119919822]
shellmann@bossbrainz:/tmp$ bzip2 -t mappingbased-objects_lang\=en.ttl.bz2
shellmann@bossbrainz:/tmp$
shellmann@bossbrainz:/tmp$ lbzip2 -t mappingbased-objects_lang\=en.ttl.bz2
shellmann@bossbrainz:/tmp$
We normally use bzip2
, lbzip2
and Apache Compress
as these are the most interoperable. pbzip2
has some issues. 7Zip
might as well.
kurzum
February 4, 2020, 6:44am
3
Another thing. Bzip2 is compressed into partitions, sometimes you need to set a flag to the decompressor so it extracts all partitions:
https://commons.apache.org/proper/commons-compress/apidocs/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.html#BZip2CompressorInputStream-java.io.InputStream-boolean-
decompressConcatenated - if true, decompress until the end of the input; if false, stop after the first .bz2 stream and leave the input position to point to the next byte after the .bz2 stream
this is counterintuitive, please check 7Zip.
Thanks, it seems to be a 7Zip issue indeed. I’ve searched the 7Zip configuration options, but there seems to be no suitable switch.
However, I’ve tried the same file with WinRar and it’s fine.
kurzum
February 4, 2020, 9:11am
5
ok, my suggestion her is clearly to install Linux/Ubuntu or a proper terminal, so you can do things like:
curl http://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09.01/mappingbased-objects_lang%3den.ttl.bz2 | bzcat | cut -f1 -d '>' | sort LC_ALL=C | uniq -c | sort LC_ALL=C -nr
which gives you the outdegree of subjects for example. Windows tools might slow you down long term.