February 2012 update allows quick partial crawling of multiple sites.
Instructions for creating networks for large collections of web sites.
SocSciBot 4.1 has a button for calculating link networks for the links between a set of web sites, plus improved network diagram functions.
SocSciBot 4 blog has more information and bug reports.
Web crawler and link analyser for the social sciences
SocSciBot is a Web crawler for link analysis research on a single web site or collection of sites, or for text search/analysis on a collection of sites. Free SocSciBot download.
SocSciBot (a) crawls one or more web sites and (b) analyses them to produce standard statistics about their interlinking and network diagrams of the interlinking. It also runs limited analyses of the text in the web sites. To analyse links to one or more web sites, use Webometric Analyst instead. SocSciBot can export network diagrams to Pajek and to UCINET. See the quick network tutorial.
SocSciBot and associated software: Conditions of use
- Licence SocSciBot is licenced free for non-commercial purposes only. We also do not accept liability for any damage resulting from its use, or for loss of data or other problems caused by the operations of the programs downloaded.
- Notification ethics You must enter your email address into the program, when requested, and check email for complaints about your web crawling. SocSciBot will email the webmasters of sites that you crawl so that they know you are crawling and can email you to tell you to stop crawling. This also safeguards you from them emailing your boss instead of you to complain!
- Bandwidth care You must not overload the web servers that you are crawling by repeatedly crawling them, e.g. every day, and it is your responsibility to ensure that you use the software carefully. It is a condition of using SocSciBot on web servers other than your own that you first read the following paper for a discussion of ethical issues for crawling. [Thelwall, M. & Stuart, D. (2006). Web crawling ethics revisited: Cost, privacy and denial of service. Journal of the American Society for Information Science and Technology, 57(13), 1771-1779.] Please do not use SocSciBot to crawl web sites of organisations that may not be able to afford the bandwidth that you are using (e.g., crawls of large web sites in poor countries other than your own).
- Complaints Your use of SocSciBot will be remotely logged to ensure that it is not being used in an unethical way, or to identify the cause of complaints. Except in the case of apparent unethical use, this information will not be revealed to a third party or used for any other purpose. (this has not happened yet) Your copy of SocSciBot may be remotely disabled without warning if there are complaints.
The program is available free of charge from here, together with processing tools. Please collect your data with SocSciBot before starting the other programs as they automatically process the results of SocSciBot.
Please note that no technical support is provided.
The program runs on Windows only and will crawl sites with up to 15,000 pages and has a speed restriction. If you wish to crawl more pages or faster, please email your request. For example, we allow faster and bigger crawls of the university web sites of richer countries.
SocSciBot may not work well on web sites with non-ASCII URLs (e.g., Chinese) and the text analysis does not work well with non-ASCII text.
There is an article describing the database structure and crawler linked from the cybermetrics database site. Please ignore all the numbers reported by the program, both in its title bar and in the summary file produced - these are for testing purposes. The reliable information is in the link data file and the text data file, but you may need to use the cybermetrics programs to get at this information.
SocSciBot can be used on its own or in conjunction with the link analysis book or the Introduction to Webometrics book.