WebSuck 0.76b

[ Download ]
[ License agreement ]
[ Setup Information ]
[ History ]

Web Spider for Java 2 v 1.3 Download Java

FREEWARE

Please report pages that you have problems downloading automatically!
This will help me tremendously in developing WebSuck!


WebSuck meets WebGet!

Getting fed up with GetRight's sluggish behaviour when downloading lots and lots of files,
WebGet has been created to meet the demand for easier downloading!
WebGet is the perfect downloader for your WebSuck-created URL files!

Check it out now!

Contact me via e-mail at:
software@ake.nu

Web Spider
WebSuck goes through the web-pages you specify and checks for links and data files. The links are followed, and the data files are output in the format of your choice (plain text or GetRight format!).

The program is best suited for downloading web galleries with large amounts of photos. There are plenty of options to make the Web Suck well-adapted to most sites layouts.

All command-line options can now be accessed from the GUI! (new from version 0.6b)
Just run WebSuck with one parameter: -gui

WebSuck does NOT download the files found. You need to use the files with a file downloader, like WebGet.
(You can also use the output of the program with software like wget on UNIX or GetRight on Windows.)

Click to get a larger version
WebSuck in GUI mode
(click to enlarge)

WebSuck in console mode

GUI Mode

Start WebSuck with -gui as one of the parameters, and the Swing GUI will be displayed!

java -jar WebSuck.jar -gui <other arguments>

Command Line Arguments

java -jar WebSuck.jar <options> <url1> <url2> <url3> ... <urlN>

At least one URL must be specified, either on the command line or in a URL file (see below).

Example: java -jar WebSuck.jar -l 3 -noexternal http://www.ake.nu/

Command Line Options

Switch Function
-gui Load the GUI. Can be combined with other parameters to change 'default' start-up settings. Also see -width and -height below.
-h, --help, or -? Display a list of command-line options
-o file The WebSuck results are output into this file.
-l # Limits the web suck to a depth of #. The start page has level 1, a page linked from the start page has level 2, and so on.
-getright x:\path\ Makes WebSuck output the data list in GetRight format. The files will be downloaded into "x:\path\<hostpath>".
-onedir Only applies when -getright is used. Saves all files in one directory

-quiet or
-verbose

Toggles verbosity: Quiet just displays the resulting data files, verbose displays even more information than normal mode.
-noimg Makes WebSuck ignore files found in IMG tags.
-option Scans OPTION tags for links and data. Doesn't add to document depth!
-usebody Scan BODY tag for background image.
-nooption Don't follow links found in OPTION tags (combo boxes)
-noexternal Makes WebSuck skip links pointing outside the parent document's host.
-datalast Only add datafiles found in a document on depth = depthlimit. This is perfect for image galleries, so you won't get all the thumbnails, just the big files.
-lastlinksonlydata Links that go down to maxdepth are only followed if the links point to data files.
-imglinks

Only follows links that are clickable images. Also good for thumb-nailed galleries.

-u username Set the http username used for non-external documents.
-p password Set the http password used for non-external documents.
-i file Reads URLs to parse from the specified text-file.
-v file Reads URLs to skip parsing from the specified text-file. Useful if you WebSuck a site in multiple parts.
-outv file Outputs a list of the visited (parsed) URLs. Can be used to skip links that have already been followed in a new Web Suck.
-nocount word A link containing this text will not add to the depth of the WebSuck! Great for multi-part thumbnail galleries!
-parseext ext1,ext2 Files with these extensions will be parsed as HTML files.
-dataext ext1,ext2 Files with these extensions will be added to the download list.
-width #

Change the default width of the GUI window, in pixels.

-height # Change the default height of the GUI window, in pixels.

History

Version

Changes

0.76b Retries 5 times before aborting download when datastream is stalled.
0.75b Removed requirement for "window.open" in javascript. Now takes first available argument as file.
0.74b Fixed all 'external link' checking so that it only matches 'domain' part (*.domain.com).
0.73b Fixed dead-locks when downloading certain files (e.g. empty files).
0.72b Fixed commenting of Homepage address in plain-text output file.
0.71b New options: -u <user> -p <pass>
0.70b New option: -lastlinksonlydata.
0.69c Added support for var='data' in tags, instead of just var="data" and var=data. (Thanks to Fernando Cassia for pointing this out)
0.69b Fixed the 'sticky' vertical scroll bar. Now it should (heh) work...
New option: -usebody. Gets the document background image from the BODY tag.
Added some default data and parse extensions.
Probably something more...
0.68b Changed the vertical scroll bar of the display window. The scroll bar now sticks to the bottom if not moved, or stays in place if moved. You can also 'stick' the scroll bar again by moving it to the bottom position.
0.67b Added file dialogs to select the different files for input and output.
Did a minor change in the HTML parser, it now works with unclosed tags.
Added -width and -height options which let you set the size of the GUI window.
0.666b You now can change the extension-lists controlling what files are treated as parse files and what files are treated as data files.
Note: References from IMG tags are always treated as data!
0.665.91b Two options were missing in the GUI. The -nooption and -noimg options have been added to the GUI! (oops!)
0.665.9b Just added an icon for the GUI window. Nice, eh? :-)
0.65b GUI options didn't affect anything if they were already set from the command-line. This has been fixed so GUI options always override the command-line options!
0.64b A bug was fixed, WebSuck only recognized extensions written in lower-case. The GUI layout had some minor changes as well.
0.63b Changed the thread handling to comply with new API. Also changed the text displayed when WebSuck is run without parameters.
0.62b GUI layout improved! Now you can also combine the -gui parameter with all other parameters, to change the start-up settings of WebSuck! The -gui parameter must be the first parameter on the command-line.
0.6b GUI mode! Start WebSuck with -gui as only parameter, and a Swing GUI will be displayed, allowing for fast and easy use!
0.5b

Lost notes

0.442b

New option: -onedir. By default, the GetRight format now places files in a directory structure like the url. Using this option uses the old way of saving files, placing all files in one dir.

Changes:
The GetRight path can be enter with or without trailing '\'.

Fixes:
The HTML parser now sees unterminated links.
There were some errors with depth-limits on data-links.

0.44b New option: -nocount. Follows links with a certain text without adding to depth. Ideal for gallery-pages where some galleries are divided into several files!
0.43b Now compiled with Java 2 v1.3.0. Some minor code changes.
0.42b Now can also follow some javascript links, like "window.open()".
Can output a list of the visited pages, for use in another WebSuck.
0.4b First public release

License Agreement

Disclaimer
By using this software, you are agreeing to that the Author, Åke Wallebom, can not be held responsible for any damages caused by the usage, or installation of the software.
The software is distributed "AS IS" with no warranty what so ever.

The program, WebSuck, hereafter called 'the software' is

Copyright © 2001 Åke Wallebom. All Rights Reserved.

The software can be copied/distributed in any form on/through any type of media by anyone.
The archive must be kept intact and contain all the files of the original archive.

You can use the software as long as you like, free of charge.

For distributors of Shareware CD-ROM's/other media
The software can be included on any shareware CD-ROM. But it is required that you e-mail me in advance and tell me that the software will be distributed on a CD-ROM, unless I have already been notified before, and have given my consent.

For Shareware services, such as internet shareware archives or BBS systems
The site/system must offer the software to the user without cost, such as entry fees, download fees, etc.
It can however be a commercial system or site, in that it charges companies for commercial-spots, etc.

 


Copyright © Åke Wallebom 2001. All Rights Reserved.