Tutorial 1: Making a Network Diagram with SocSciBot 4
Overview
This describes the simplest way to use SocSciBot 4 to create a network diagram of the hyperlinks between a collection of web sites.
**If you have a set of blogs or web sites to crawl, please copy these instructions but use your URLs instead of the italic URLs below.**
Step 0: Install SocSciBot 4
- Go to the SocSciBot web site http://socscibot.wlv.ac.uk/ and follow the link to download SocSciBot 4 if you agree with the conditions of use. Choose a place to save SocSciBot 4 to where you have plenty of storage space to save data.
Step 1: Crawl your sites
- SocSciBot works in two stages: first it crawls a set of web sites, then it Start SocSciBot 4 by double clicking on the file called either SocSciBot4 or SocSciBot4.exe where you saved it on your computer. This should produce a dialog box similar to the one below. This only happens the first time you start SocSciBot.
- Confirm that the folder chosen by SocSciBot 4 to store your data is acceptable by clicking OK. Also enter your correct email address. It will be used to email the webmasters of any sites that you crawl. This is both ethical practice and may save you from getting into trouble if a webmaster is unhappy with you crawling their site - they can email you directly instead of emailing your boss or network manager. You can also enter a message to be included in the email to give the purpose of the crawl. You may wish to include the URL of a page with additional information about your project. Also, answer any questions about the location of Microsoft Excel and Pajek - you can say NO to both of these.
- Enter test as the name of the project at the bottom of the next dialog box, Wizard Step 1, and then click on the start new project button. All crawls are grouped together into projects. This allows you to have different named groups of crawls which are analysed separately.
- In the Wizard Step 2 dialog box tick Download multiple sites/URLs in one combined crawl and click the Crawl Site with SocSciBot button.
- You will see the main multiple crawls screen. Check the Crawl web sites to a maximum depth option.
- Now you must enter a list of web sites to crawl. Please try a small example first and when this works, repeat this entire page for your web sites. The small example is the set of web sites: http://linkanalysis.wlv.ac.uk, http://socscibot.wlv.ac.uk, http://lexiurl.wlv.ac.uk. [you can use your own set of URLs instead if you like] Enter these web site URLs into a plain text file using Windows Notepad (in the Accessories program group), with one line per URL. The file should look like this.
- Now click the Load list of URLs to Crawl button and select the file with the list of URLs that you have just saved on your computer.
- Next you will be asked a strange question: select 0 for this one. [This relates to what count as the "nodes" on the network diagram]
- Now click the Quick Crawl button. This starts the web crawling, which will take a few minutes for these small sites. You can read information about the crawl in the title bar at the top during the crawl and also at the end of the crawl. Note that the Quick Crawl button works the same as the Crawl Above Sites/URLs button except that it times out after a given period of time. Use the Crawl Above... button whenever you are completing a crawl for a research project rather than for practice.
- Click OK to shut down SocSciBot when the crawl is complete. You have now crawled pages from three small web sites. The next stage is to create the link network.
Step 2: Create the network diagram
- Start up SocSciBot Tools by double clicking on the SocSciBot4 or SocSciBot4.exe file again. This should take you straight through to Wizard step 1. Click on test to select this project to analyse, exactly as you have done twice before.
- Select Analyse LINKS in Project with SocSciBot Tools from the Wizard Step 2 to start the link analysis process.
- You will be asked if you want to calculate the link analysis reports for the project (the three web sites crawled). Answer Yes to this question.
- Next you will be asked if you want to standardise home page file names in your data. This improves the results by treating different versions of a web site home page as the same for the analysis. Click Yes standardise home page file names and then wait a few seconds for the reports to be calculated..
- After a few seconds, the reports will have been calculated and you can view them using the tabbed sections in the lower half of the screen. To see the network, click the Show Site Network button.
- You should now see the network below (perhaps arranged differently). All these web sites link to each other so there are arrows between them all.
Rearranging the network: You can move the nodes around to rearrange the network or right click on a node to get a list of properties that you can change. Please experiment with the right click menu and other menu options to see how they work. For large networks, please try the Automatic option in the Layout menu. More information about the network drawing tool is here.
THE END
You have now finished! Try the above again for your own set of web sites or continue below for more information (optional).
Extra information about the web site network
- Please see tutorial 1 (especially the end) for information about the other networks that SocSciBot can create.
Notes
The steps of this tutorial apply equally for small and large projects. The only difference is that for a large project, it may take a significant time for the site crawls and for SocSciBot Tools and Cyclist to process the data. Extra information about features specific to large projects.