Digital Library of India: Download all that you can...

Submitted by hpn on February 22, 2007 - 18:50

[:http://dli.iiit.ac.in/|Digital Library of India] has been unveiled, but with a shocker of an interface. But not much can be expected out of a "Government of India" project, as they always manage to find just the right technologies (or people?) for their job (Why, e-governance in India is all ready to go Microsoft's way. When M$ boasts riches, we can all show our kids its logo and say that our government of poor people is one of their key customers). They use TIFF format for image scans of thousands of books probably from libraries all over India. There are two petty interfaces which need you to download a software to be able to view. The software, in turn, needs to be registered to be able to use. Last month, it was the [:http://ildc.gov.in/Kannada/kdownload.htm|disappointing set of tools] released by TDIL, and this month, the DLI. Not to mention, the projects are obviously worth a lot, but shabbily done. Idea seems to be right, but implementation has been terribly bad.

A Quick Script

To overcome the toil between interest for books on DLI (which otherwise are not easily available) and the irritation of shabby interface, I wrote a (shabby) script that batch downloads the TIFFs. It is a quickly written script with pieces from here and there that has many stupid parts (which undoubtedly would be mine). But mainly it works, just like the projects I've been mentioning here. Serves right, in a way. It needs you to paste URLs pointing to TIFF of starting page for each book, with the filename removed. Pretty clumsy, yes. But that was convenient for me, since I removed the frames from their web page while viewing, browsed through the list, and clicked on each to check the quality. Saves the irritation for the next many pages. Ah, and you'll need to paste the URLs onto a file. For use on just a single book, it is easy to modify, anyway. Try it, modify it and let me know if you improve it.


#!/bin/sh
#Get your favourite book from DLI: Specify the start page and end page, and this script takes care of the rest.
#Caveat: you'll need to specify the base URL, though. 
PATH=/bin:/usr/bin:/usr/local/bin
progname=`basename $0`
case $# in
    0) 1>&2 echo $progname: usage $progname start end; exit 1 ;;
esac

start=$1
end=$2
prefix=0000

echo "Enter path for the file to read:"
read file

#exec > $HOME/log_dli.txt

x=1
lns=`wc -l $file`
echo "LNS: $lns" 

index=`expr $start`

while [ $x -lt $(wc -l <$file) ]
do
url=`head -n $x $file | tail -n 1`

index=`expr $start`

mkdir $x
cd $x

while [ $index -le $end ]
do
    if [ $index -lt 10 ] 
    then
	digits=000
    elif [ $index -ge 10 ] 
    then
         if [ $index -lt 100 ] 
	 then
	   digits=00
    	 else 
           digits=0
         fi
    else 
         echo "bah. \n" 
    fi
    WGET_OUTPUT=$(2>&1 wget --timestamping --progress=dot:mega \
              "$url$prefix$digits$index.tif")
    # wget $url$prefix$digits$index.tif    

    if [ $? -ne 0 ]
    then
	# wget had problems.
	echo 1>&2 $0: "$WGET_OUTPUT"  Exiting.  
    fi
    if (echo "$WGET_OUTPUT" | fgrep 'Not Found') > /dev/null
    then
        break
    else
	echo "~~~~ Page found. Downloaded. ~~~~ \n" 
    fi

    index=`expr $index + 1`
done

cd ..
x=`expr $x + 1`

done

Note: Make sure the URL is in the form of

http://dli.iiit.ac.in//server12/disk3a/TO%20SUBMISSION/KANNADA/Bharatiya%20Tatva%20Shastra%20Samgraha//PTIFF

and each URL is to be placed on the file in new line (the script doesn't detect empty lines).

See also:

[:http://hpnadig.net/notes/converting-and-merging-tiff-to-pdf|Converting Tiffs to PDF].
[:http://sampada.net/Kannada-ebooks-torrent-1-and-2-Index|Torrents to several books] I prepared using this script.

Enjaaay!

Topics

Internet

Books

India

Sathish Nayak B (not verified)

Re: Digital Library of India: Download all that you can...

March 3, 2007 - 15:30 Permalink

hpn

Re: Digital Library of India: Download all that you can...

March 3, 2007 - 18:14 Permalink

Anonymous (not verified)

I dont know how to run this script and download the books.

March 27, 2007 - 15:11 Permalink

mala (not verified)

Dear Sir, Pls help me also,

March 19, 2008 - 12:56 Permalink

sharath (not verified)

dear mala copy the script

October 1, 2008 - 11:01 Permalink

hpn

Dear Sharath, This is a

October 1, 2008 - 16:37 Permalink

HIMANSHU SINGLA (not verified)

Script has some problem

October 2, 2008 - 12:54 Permalink

hpn

Himanshu, I checked the

October 2, 2008 - 18:53 Permalink

sharath (not verified)

dear people happy news for u

October 3, 2008 - 02:38 Permalink

kannan (not verified)

Re: dear people happy news for u

January 3, 2012 - 13:03 Permalink

jagadeesan (not verified)

same procedure do it in

same procedure do it in internet download manager popularly called IDM for fast download
after all tif files downloaded, to convert into one pdf
procedure is download cutepdfwriter it is a free software install it (2 files one is exe,other is gpl updater)
after installing select one book folder of all tif files select all tif files give print / now select cutepdf writer
thats all
it will scan all image and open save directory
give it a nam for that file
pdf file will create

November 28, 2013 - 16:04 Permalink

sharath (not verified)

November 28, 2013 - 16:08 Permalink

Add new comment

Your name

Email The content of this field is kept private and will not be shown publicly.

Homepage

CAPTCHA

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

Digital Library of India: Download all that you can...

A Quick Script

Re: Digital Library of India: Download all that you can...

Re: Digital Library of India: Download all that you can...

I dont know how to run this script and download the books.

Dear Sir, Pls help me also,

dear mala copy the script

Dear Sharath, This is a

Script has some problem

Himanshu, I checked the

dear people happy news for u

Re: dear people happy news for u

same procedure do it in

dear hpn thank u for the

http://sampada.net/blog/shree

wondering with your work

Download Script

Re: Download Script

use cutepdfwriter

Add new comment

Filtered HTML