This is ASPSeek FAQ.

Q: I compiled and installed ASPSeek without any errors, and I have latest MySQL
up-n-running, but then I run index, it complaints like this:

index: error while loading shared libraries: libmysqlclient.so.10: cannot
open shared object file: No such file or directory

A: This is not ASPSeek bug. You just haven't tell dynamic object linker
where to get MySQL client library. Please find location of libmysqlclient.so*
on your machine and put the path to it to /etc/ld.so.conf. For example, on my
box I have MySQL installed to /home/mysql, so I did
# echo "/home/mysql/lib/mysql" >> /etc/ld.so.conf
After adding that path you should run
# /sbin/ldconfig -v | grep libmysqlclient
If you'll see something like 
        libmysqlclient.so.10 -> libmysqlclient.so.10.0.0
then everything is ok now and you can use ASPSeek.

Another way is to set environment variable LD_LIBRARY_PATH to contain the path
to libmysqlclient.so before running index and searchd. Please consult docs
for dynamic linker for more information.


Q: For many hours, I am indexing a large site and I notice that even if
the database is updated, when I try to search, s.cgi does not seem to
retrieve any expected results. I stopped the index, but s.cgi still can't
find anything. What's wrong? Should I wait till indexer finishes?

A: Yes, you should. Or, if your site(s) is/are really very big and you
can't wait to try the search, you can stop the index (by running index -E
in other terminal) and run index -D. This will put the indexed data from
delta files used for indexing to the database used for searching, and
perform some other needed things, like ranks and lastmods calculations.
On a low-end hardware with small amounts of RAM, you can use sequence of
index -n <num>; index -D commands in loop, there num is number of URLs
index should process before terminating. Note that if index -D runs for
long long time, you need to read the next Q/A.

Q: I feel that on my bigbox index is slow. I have a lots of memory, what's
wrong?

A: Probably you haven't tuned your SQL server, so it uses too little memory.
If you use MySQL, try to increase mysqld buffer_size. You can do that by
passing -O key_buffer=xxxM option to safe_mysqld, there xxx is number of
megabytes of RAM that MySQL will use for its key buffer. We have set it to
256M on www.aspseek.com. Please consult MySQL manual for more information.

Other way to speed up index process if you are indexing many servers is to
specify -N <num> option to index. This will create <num> threads working at
the same time indexing different servers. It makes no sense to specify <num>
greater than number of servers being indexed. You can also use -R <num>
option to increase number of DNS resolvers, if you see way too many
"Waiting for resolver" messages from index.


Q: My binaries are way TOO big. What's wrong with it?

A: Nothing. This is mostly debug info produced with -g switch to compiler.
You can either set CXXFLAGS environment variable to something not containing
-g just before running configure, or you can "make install-strip" instead of
"make install" to strip your binaries while installing (this is a good idea
anyway if you don't have plans to debug ASPSeek).


Q: index -N xxx do not help to increase speed of indexing. What's wrong?

A: You probably have only one server, and index is written to do only
one request at a time to one site, like any other polite robot. Some people
even don't like this, so you can use MinDelay aspseek.conf command to further
reduce Web server load.

If it is your local server, yes, you can do anything with it, including many
simultaneous requests. But ASPSeek just can't do it now, maybe we'll implement
it as an option in future versions.


Q: What should I do to setup ASPSeek so it will be able to search for words
in my native language?

A: First, edit charsets.conf to uncomment all needed charsets and their
aliases (CharsetTable and CharsetAlias commands). Second, put the appropriate
LocalCharset command in charsets.conf and in s.htm (you should also uncomment
one CharsetTable, corresponding to your local charset). It is also a good
idea to uncomment StopwordsFile for your language.

In case you have to index misconfigured web servers, which do not tell
charset in their replies, please compile ASPSeek with --enable-charset-guesser
switch to "configure" script.


Q: I have reached the MySQL table size limit with urlword table. It's about
4 Gb now. Is it possible to fix it so I can index more URLs?

A: Yes. By default MySQL table size is limited to about 4Gb, but it can be
changed. To get an increase over 4Gb in Max_data_length you need to set
MAX_ROWS large enough that AVG_ROW_LENGTH * MAX_ROWS is greater than 4Gb.

The following example illustrates how it can be done from command line mysql
client:

mysql> show table status;
+------------+ ... ----------------+-----------------+ ...
| Name       |      Avg_row_length | Max_data_length |
+------------+ ... ----------------+-----------------+ ...
| urlword    |                 152 |      4294967295 |
+------------+ ... ----------------+-----------------+ ...
mysql> alter table urlword MAX_ROWS=100000000;
mysql> show table status;
+------------+ ... ----------------+-----------------+ ...
| Name       |      Avg_row_length | Max_data_length |
+------------+ ... ----------------+-----------------+ ...
| urlword    |                 152 |   1099511627775 |
+------------+ ... ----------------+-----------------+ ...


Q: If I start index for the second time, will all documents need to be
reindexed all over again?

A: No, index is wise. First, it uses Period command from aspseek.conf,
and tries to reindex document only if that period is expired.

Second, if period is really expired (or you run index with -a flag), it sends
If-Modified-Since and If-None-Match headers with the values that were in
Last-Modified and ETags headers sent by server during previous indexing, so
web server double checks that document was really modified, and sends it
only if it really was. Also, if server sends Expires header and the its value
is more than (now + Period from aspseek.conf), index use that value instead
of now + Period. For more details on these headers, see RFC 2616.

Of course, it all depends on server, and dynamic content usually gets
resended anyway, even if it's absolutely the same, because server doesn't
know anything about dynamic content. To avoid reparsing such document,
index first calculates its md5 checksum, and if it's not changes, no
parsing it done. This improves indexing speed a bit.


Q: What are the files in var/<DBName>/NNw/ used for?

A: These are files of reverse index. It is these files that searchd uses
for search itself. Each file contains information about particular word.
File is created only if it's size > 1K, otherwise data is stored in the BLOB
"wordurl.urls". Name of each file is the word ID that stored in the field
"wordurl.word_id".


Q: How can I restrict search to different sets of sites (for example,
"Linux sites", "Windows sites" etc.)?

A: You can use Web spaces for it. However, we do not yet have a front-end for
dealing with spaces. So, to create web spaces (groups) and assign sites to
them, please perform the following steps:

a) Assign numeric ID to spaces, so let's group "Linux" will have ID 1 and
group "Windows" will have ID 2.

b) Insert into database table "spaces" records with appropriate fields
space_id (1 or 2) and site ID (site ID can be obtained from field
sites.site_id) using SQL INSERT command, example:
  INSERT INTO spaces(space_id, site_id) VALUES (1, 1)
and so on.

c) Run "index -B". This command will create binary files spN, where N is the
space ID in the directory var/<DBName>/subsets.
These files are used by "searchd" if search is restricted by space(s). Note
that these files are also created after saving of delta files (index -D).
Now database is ready to search within web spaces.

To search within particular web space(s) use spN=on parameter of s.cgi,
where N is the number of web space. You can use multiple spN parameters.
If values of all spN parameters is off, then search will not be restricted
by any space.

Examples:
s.cgi?sp1=on&sp2=off&...
s.cgi?sp1=on&sp2=on&...
s.cgi?sp1=off&sp2=off&...


Q: Your software does not work (searchd dumps core, index does not do its work
etc.). What should I do?

A: Our stable releases are really stable, so if you are seeing some really
weird bugs, please first try to recompile ASPSeek without compiler
optimization. This is needed because some versions of C++ compilers
miscompiles some pieces of C++ code (this can be either ASPSeek source code
or inline functions in STL headers), which leads to bugs and instability.

To compile ASPSeek without optimizations, please do the following
CXXFLAGS="-O0 -g" ./configure

If ASPSeek still behaves badly after that, there are chances that you have
really found a bug. In that case, please read the next Q-A.


Q: How can I put a number of documents available for search to search page?

A: Please use the number from var/<DBName>/total file. This file is generated
each time after saving delta files, and contains total number of indexed not
empty URLs. Trust us - this is the only true and fair number.


Q: I have found a bug in ASPSeek. What should I do?

A: First, please read the previous Q-A. If you are still thinking it's a bug,
please report it to developers <aseek@sw.com.sg> or use feedback form on
ASPSeek home page. Please provide all necessary information when submitting
a bug report. Just telling "I can't use your program" or "index dumps core"
is not enough.

So, you should include:
* appropriate configuration files or relevant fragments;
* URLs which you've tried to index if it is accessible from Internet,
  or some idea of what data do you have there if it's in your local intranet;
* some hardware details, like CPU, memory, disks information - all that affects
  speed and portability;
* operating system name and version;
* versions of C++ compiler, make utility, SQL server;
* some notes about SQL server configuration if it differs from default.
