| 1. |
Robot-Manager parses my log file but when it's trying to display the Spider Visit Detail tree it takes a long time and seems like it locked up.
One of two things can be happening here - Your log file format is being detected incorrectly, but Robot-Manager thinks it got it right.
- You are analyzing a *very* large file (1/2 gigabyte or better) and have too many spider visits to display in the window.
Not to worry, though. There are a couple of ways around this:
|
| 2. |
When I enter my authorization code, I get a timeout message?
If your authorization code is not working, chances are you are running a Personal Firewall software like ZoneAlarm. These applications block access to the internet. You will need to configure your firewall software to allow the software access to the internet. See the documentation for your firewall software on how to do this.
|
| 3. |
Who is the spider ia_archiver?
ia_archive is from www.archive.org and is appropriately named the "Way Back Machine". It's a spider that is continually indexing the internet and keeps archives of entire web sites—some as far back as 1995. Go check it out. If your site has been up for a while, they will have complete copies indexed for past years.
|
| 4. |
Spiders are attempting to index pages not on my site?
This in not uncommon at all. If your log file format is correct, these pages show up with a red 'X' through the page icon in the tree view. Mouse over the page to see a description of the page error. A 404 indicates that the page is missing from your web site. The spider that is trying to index that page probably indexed it a while back when it existed and is now coming back to update the index. In this case, your page is usually dropped from the search engine.
|
| 5. |
Robot-Manager stopped detecting spider visits?
Your log file format has probably changed. You can correct this in one of two ways. First, you can start a new project to analyze the newer log files. This new project will detect the new log file format. Second, you can change the log file format in your current project the match the new format. To do this, load your project file into Robot-Manager and select Tools|Log File Format from the main menu. When the Log File Format dialog appears, change the index values to match those of the new log file format. The sample entry to the right is the old format, so you many need to examine the new log file to determine the format.
|
| 6. |
How do I add robots.txt to GoLive?
To add your new robots.txt file to your Site Document in GoLive, do the following:
- Start GoLive and open your Site Document for your web site.
- Make sure you have saved your robots.txt file to the root directory of your web site. If you have not done this yet, go back to Step 3 - robots.txt and save your robots.txt file.
- Now with your Site window open and the Files tab showing, right-click in the tree view and choose Update from the popup menu. This will rescan your root folder and add your new robots.txt file to your web site project.
- You should now see your new robots.txt file in the tree view (usually at the bottom, depending on your sort option).
Note:
Once the file has been added to your site, use the FTP Server commands to connect to your site and upload the new file.
|
| 7. |
How do I add robots.txt to FrontPage?
To add your new robots.txt file to your FrontPage web site, do the following:
- Make sure you have saved your robots.txt file to the root directory of your web site. If you have not done this yet, go back to Step 3 - robots.txt and save your robots.txt file.
- Start FrontPage and open your web site. FrontPage will automatically add your new robots.txt file to the web site.
Note:
Once the file has been added to your site, use the Publish Web command to connect to your site and upload the new file.
|
| 8. |
How do I allow all spiders full access?
There are two ways to allow all spiders full access to your web site. You can either not include a robots.txt file in your web site. Or, you can do the following:
- If not already open, Start Robot-Manager and create a new project.
- Click Save To robots.txt on the toolbar.
Resulting robots.txt File
User-agent: *
Disallow:
|
| 9. |
How do I exclude all spiders?
To exclude all spiders from your web site, do the following:
- If not already open, Start Robot-Manager and create a new project.
- Click Disallow in the Project Steps sidebar.
- Check the option "Disallow access to this site for the selected spider".
- Click Save To robots.txt on the toolbar.
Resulting robots.txt File
User-agent: *
Disallow: /
|
| 10. |
How do I direct country specific spiders?
Many sites today are multi lingual and need to manage where spiders visit there site. If you have both German and English content on your web site, you don't want a German spider visiting and attempting to index your English content. They may give up and leave your site even though you provide German content. Here's how you can direct those spiders to the relevant content of your site.
This example assumes a web site directory structure like the following:
http://www.mysite.com/index.html
http://www.mysite.com/english/index.html
http://www.mysite.com/german/index.html
This will be a simple site. The home page in the root directory gives the user a choice of either English or German. Depending on their choice, they are redirected to the home page in either the english or german directory. The home pages in these directories are language specific.
Again, for simplicity, we will only choose three spiders. The standard All Spiders and an English and German spider. Here are the spiders we chose:
All Spiders
Acoon (German)
AltaVista (English)
We'll jump ahead for a moment and show you what the resulting robots.txt file should look like. We have stripped out comments for brevity sake.
User-agent: Acoon Robot
Disallow: /english/
Disallow: /index.html
User-agent: Scooter
Disallow: /german/
Disallow: /index.html
User-agent: *
Disallow:
To explain, the first section for Acoon tells its spider to ignore the english directory and the home page. There is no need for the spider to index the home page since we are merely using it to redirect users to the appropriate language specific home page. Same goes for Scooter. It doesn't need to index the home page or any content under the german directory. All other spiders are given full access to the web site.
We could take it a step further and disallow access to the home page for all spiders. After all this page really has no content and merely redirects users to the appropriate language specific home page.
This was a simplified example, but does build the foundation for a more complex robots.txt exclusion file. This may even give you ideas on how to organize your web to be multi lingual.
|