                    Netcache V0.21 alpha        21/2/95

    Thank you for downloading the Netcache utility program. This program when 
run in conjunction with the Netscape Navigator (tm) will allow offline reading 
of world wide web pages via Netscape.

Any comment,suggestions for improvement and bug reports should be sent via 
private e-mail to me at

nmott@cix.compulink.co.uk

and I will attempt to address them.

This program although written and copyright by myself is FREEWARE. Please feel 
free to distribute (or not) to whomever you wish. If you do,however, I would 
request that you make no modification to the executable or documentation 
without my consent and that it is ditributed in the original ZIP archive 
format supplied such that the code and documentation will remain together.

Other than that I hope you find the program useful and enjoyable as I have 
done in producing it

Thanks
        Neil Mottershead

------------------------------------------------------------------------------

Features of Netcache V0.1 alpha

    Copies all files from netscape cache into separate directory and renames 
them to sensible(ish) dos names.

    Parses all html files and converts pages and image references to file 
references.

    Generates a report of missing images and orphaned images

    Generates a netscape history file of all available cached pages

    Allows manual addition and linking of page/image files

------------------------------------------------------------------------------

Additions for Netcache V0.2

    All pages and images are now copied to the same directory and the parsed 
html files no longer contain path information.

    Additional option to delete html references to non existant images to 
avoid stalling netscape when viewing. 

    Addtional option to delete all inline image references from the pages

    Addiional option to show directory of cached pages

    Additional option to copy pages to another path (with inline images)

    Addional option to force reparsing of all html files

Additions for Netcache V0.21

    Addition of optional cache directory specifier

    Modification of url parser to cope with some strange urls

    Reduction of heap usage to allow handling of larger cache directories

------------------------------------------------------------------------------

Configuring Netcache


    The netcache program requires a configuration file to tell it the location 
of the netscape directory and  where to write then database of  cached pages 
and images.

    The configuration file is called netcache.cfg and must reside in the same 
directory as netcache.

    You must manually create the directory that will receive the image/pages 
generated by netcache.

    I suggest that the netcache program/config file lives in the same 
directory as netscape and that the netcache files are placed in a subdirectory 
of the netscape directory called netcache.

    The config file is an ascii file with four lines

    <path to images> 
    <path to pages>
    <path to netscape>
    <path to netscape cache>

    A typical config file would be

    c:\easynet\netscape\netcache\
    c:\easynet\netscape\netcache\ 
    c:\easynet\netscape\
    c:\easynet\netscape\cache\

    This specifies that copied pages/images go into the directory 

        c:\easynet\netscape\netcache

    whilst the netscape files are located in directory

        c:\easynet\netscape

    The netscape cache is to be in directory    

        c:\easynet\netscape\cache

------------------------------------------------------------------------------ 
Running Netcache

    Once configured, netcache can be run by typing the following at a dos 
prompt.

    Netcache

    The program will then process all files in the netscape cache directory 
modifying them and then copying them to netcache directory

    To view the resultant pages fire up netscape and select 

    File/Open File 

    menu item or press <Ctrl> O

    Select the netcache directory using the mouse to display a list of 
available pages and click to view it.

    The page will now be displayed complete with inline images.

    Once the netscape cache has been processed it can be deleted with a

    del *.* 

    Sometimes the netcache directory does not contain all the necessary files 
to display a web page, ie there are missing images.  If this happens netscape 
will attempt to retrieve the file from the server. As netscape is offline, it 
will obviously not be able to contact the server and will appear to hang. To 
view the page click on the netscape 'Stop' button to abort the connect. See 
later for ways to overcome this problem. 

------------------------------------------------------------------------------ 
Netcache operation

    As well as renaming the netscape cache entries to meaningful dos names, 
netcache also attempts to assist Netscape to find and display inline images 
and linked pages.

Netcache also builds up an index of pages and images in the page.idx and image.
idx. For each image/page there is one line that holds the filename and the 
site / page / name of the entry.  If you wish to delete any files from the 
netcache directory (ie orphan images) I strongly recommend editing the 
relevent .idx file and killing its entry by deleting the line and resaving.

    Within a html document an inline image or linked page is reference with an 
embedded command like the ones below

    <img src = http://www.easynet.co.uk/icons/easylogo.gif>
    <A HREF=pages/tourist/tour.htm>

    This tells netscape where to find the page/image, in this case on a world 
wide web server.

    When netscape loads this page, it puts it in the netscape cache with an 
internal name, and writes a reference to it to a file call FAT that also lives 
in the netscape cache directory.It than does the same thing for all the inline 
images used within that page.  However, I believe that once you quit netscape 
and then start a new session, it does not attempt to use this cache info for 
reasons beyond my comprehension.

    Netcache recovers this cache information and builds it's own cache 
database. In order for netscape to know that the files are now on the disk the 
html files have to be amended 

    <IMG SRC=file:/easylogo.gif>
    <A HREF=file:/tour.htm>

    This tells netscape that the files are located in the current directory 
and not on the web server.

Sometimes netscape deletes files from its cache just prior to exiting the 
program for reasons unknown to me.  It is possible to recover these images if  
you run an Undelete program on the cache directory. You will unfortunately 
have no immediate information about the nature of these files, but with a 
little ingenuity it is possible to identify them and netcache provides a 
mechanism for manually inserting them into the database.

------------------------------------------------------------------------------ 
    Netcache options

    The following options affect the conversion process from the netscape 
cache to the netcache directory


    Netcache /P

        This option causes any inline image references to be deleted from the 
modified page when the image is not present in the cache.

    Netcache /A

        This option causes all pages to be reparsed irrespective of whether or 
not any pages/images have been added. You will need to specify this option if 
you wish to apply the /P option AFTER you have converted the pages.


    Netcache /L

        This option (Lynx mode) will strip all inline graphics from converted 
pages leaving just the text and page links. It must be specified with the /P 
option to work in any meaningful way.


The following options work on the converted image/page database in the 
netcache directory and not with the netscape cache directory files 
    
    Netcache /F <filename> <url> <type>

        This option allows the user to manually add an image/page to the 
database.  This is extremely useful in the case where an inline image has not 
been downloaded and you wish to install an image manually rather than delete 
the image reference with the /P option.

Adding an image manually is a two stage process,firstly you must aquire the 
image to be installed and secondly you must copy and link it into the database.

As an example, let us say that we have the htp file cafe.htm and that it is 
missing the inline image 'minicyb.gif'

Image Aquisition

There are three options open to us

a/  Undelete option.

    Use an Undelete utility in the netscape cache directory and examine each 
of the undelete .MOZ files for a gif header (GIF87 or GIF89).  These files can 
then be renamed to .GIF and viewed using a gif viewer. If we have a rough idea 
of the nature of the image it should be fairly straight forword to identify it 
and then to rename it to 'minicyb.gif'.

b/ Web retrieval

Find the full http specification of the image either by examining cafe.htm or  
by viewing the netcache .RPT report file section on missing images which in 
this case would be

http://www.easynet.co.uk/icons\minicyb.gif
	
    Connect to your provider and open this as a url and netscape will retreive 
it and save it straight to disk.

c/  Improvisation

    Take any .gif file on hand similar to the one required and copy it to 
'minicyb. gif'. When the missing image is just a bulleting image (ie blueball.
gif) then it is perfectly acceptable to borrow a similar image from a 
different page and reuse it.

Insertion in database

    Now that we have got our image we need to install it in the database. As 
outlined before the usage is 

    Netcache /f  filename url image/text 

    in our case this would become

Netcache /f minicyb.gif http://www.easynet.co.uk/icons\minicyb.gif image

The source .gif file must reside in the netcache program directory and 
netcache will then copy it to the netcache file directory and will reparse all 
htm files and update any links to the image. This will obviously not work if 
you have run netcache with the /P option to delete missing image references so 
be warned.

Page references can similarly be added but the last command line parameter 
must be changed from 'image' to 'text' so that the reference is added to the 
correct index.


    Netcache /d  match-pattern

        netcache will printout the name of pages held in it's netcache 
directory and has a wildcard selection mechanism

    Netcache /d *                           displays all pages
    Netcache /d   /                         display all home pages
    Netcache /d http://www.easynet.co.uk    displays easynet pages
    Netcache /d  index    displays the full url of all pages called index
    Netcache /d   /stargate/*     will list all pages with the stargate path


    Netcache /c match-pattern path

        netcache will copy all matching pages(complete with their inline 
images) to the specified path

    Netcache /d  index a:\

        will copy the any index pages (with images) to the A drive.

------------------------------------------------------------------------------
 Netcache /H

        This option display the copyright banner.

------------------------------------------------------------------------------

Summary of command line options

Netcache                        Default operation
Netcache /A /P                  Strip refs to missing inline images
Netcache /A /L /P               Strip refs to all inline images
Netcache /F filename url type   Manually insert page/image into database
Netcache /D url-pattern         List cached web pages
Netcache /C url-pattern path    Copy cached web pages to another drive/directry
Netcache /H                     Display help/copyright banner
Netcache /G                     Windows in-joke

------------------------------------------------------------------------------

