News
October 2008: The Februrary 2009 update of SocSciBot 4.1 solves the freezing problem, allows multiple simultaneous crawling, and fixes a problem with banned lists.
March 2008 there is a SocSciBot 3 blog and a SocSciBot 4 blog for more information and to report bugs.
SocSciBot
Web crawler and link analyser for the social sciences
SocSciBot is a Web site crawler designed for link analysis research. It can be used to conduct link analysis on a single site or collection of sites, or to run a search engine on a collection of sites. It can also be used in teaching, to illustrate how link analysis and search engines work. Note that SocSciBot does not work well on web sites with non-ASCII URLs and Cyclist does not work well on web sites with non-ASCII text.
SocSciBot works by (a) crawling one or more web sites and then (b) analysing them to produce standard statistics about the interlinking between the sites and network diagrams of the interlinking. It can also run a limited linguistic analysis of the text in the collection of web sites. If you need to analyse links to one or more web sites, then LexiURL Searcher is recommended instead.

SocSciBot and associated software: Conditions of use
- SocSciBot is licenced for non-commercial purposes only. We also do not accept liability for any damage resulting from its use, or for loss of data or other problems caused by the operations of the programs downloaded.
- You must enter your correct email address into the program, when requested, and check your email during periods of crawling (in case web masters complain about your web crawling). SocSciBot will automatically email the webmasters of sites that you crawl so that they know you are crawling and have the option to email you to tell you to stop crawling.
- You must not use SocSciBot to crawl web sites of organisations that may not be able to afford the bandwidth that you are using (e.g., in poorer countries).
- You must not overload the web servers that you are crawling by repeatedly crawling them, e.g. every day, and it is your responsibility to ensure that you use the software carefully. It is a condition of using SocSciBot on web servers other than your own that you first read the following paper for a discussion of ethical issues for crawling. [Thelwall, M. & Stuart, D . (2006). Web crawling ethics revisited: Cost, privacy and denial of service. Journal of the American Society for Information Science and Technology.]
- You must accept that your copy of SocSciBot may be remotely disabled without warning, for example if there are any complaints about its use.
- You must accept that your use of SocSciBot will be remotely logged. This is to ensure that it is not being used in an unethical way, or to identify the cause of complaints. Except in the case of apparent unethical use, this information will not be revealed to a third party or used for any other purpose. (this has not happened yet)
|
The program is available free of charge from here, together with processing tools. Please collect your data with SocSciBot before starting the other programs as they automatically process the results of SocSciBot.
SocSciBot and associated software: Downloads and instructions
Download the programs only if you accept the conditions of use.
Tutorials and extra information for SocSciBot 4. (If using SocSciBot 3 then follow the link to the SocSciBot 3 FAQ and Tutorials).
|
Please note that no technical support is provided.
The program runs on Windows 95 and later, and will crawl sites with up to 15,000 pages and has a speed restriction. If you wish to crawl more pages or faster, please email your request. For example, we allow faster and bigger crawls of the university web sites of richer countries.
There is an article describing the database structure and crawler linked from the cybermetrics database site. Please ignore all the numbers reported by the program, both in its title bar and in the summary file produced - these are for testing purposes. The reliable information is in the link data file and the text data file, but you may need to use the cybermetrics programs to get at this information.
SocSciBot can be used on its own or in conjunction with the link analysis book.