
=item BuildIndex()

Usage:
	&BuildIndex();

BuildIndex completely rebuilds the index for a local realm.  Because the webpages in local realms are readily accessible, this function tends to process huge data sets quickly.  It is self-restartable through a meta-refresh; state information is stored in the $start_pos parameter and working data is stored either in the database or the index_file.working_copy file.

For file-based indexes, all new data is written to index_file.working_copy.  When the process is finished, possibly after several browser requests, the original index_file is deleted and index_file.working_copy is renamed over the top of it.  Thus, users are able to perform searches on the intact index_file while the BuildIndex process in progress.  In addition, it is possible to safely abandon the BuildIndex process.

For SQL-based indexes, we don't have that concept of a temporary storage area.  Instead, each record is updated as the webpage is encountered.  At the end of the BuildIndex process, if we get there, we delete all records whose lastindex time is older than "start_time".  The only records older than "start_time" are those that were not detected by GetFilesByDirEx, or that were excluded for other reasons.

This is an interactive function; errors and other status messages are shown to the user by printing HTML.

Dependencies:
	$const{'script_start_time'}
	$realms->hashref
	%FORM
	%Rules
	&FormatNumber
	&get_dbh
	package LockFile
	package fdse_filter_rules

=cut


=item Capitalize($)

Usage:

	my $cap_string = &Capitalize($str);

Capitalizes English-language strings.

=cut


=item CheckEmail($)

Usage:
	my $err_msg = &CheckEmail( $address );
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}

Checks whether the argument is a valid email address or not:
	address not blank
	contains text @ text
	text follow @ is valid hostname (can be resolved)

Based on Ian Dobson's CheckEmail function.

=cut


=item DeleteFromPending($$)

Usage:
	my ($err_msg, $delcount) = &DeleteFromPending( $realm, \@urls );

=cut


=item FlockEx($$)

Usage:
	if (&FlockEx( $p_filehandle, 8 )) {
		# okay
		}

Abstraction layer to protect non-flock systems.

=cut


=item FormatDateTime

Dependencies: none

Usage:
	my $date_str = &FormatDateTime( time(), $format_type, $b_format_as_gmt );

Written to model Microsoft's FormatDateTime function for vbscript and jscript.  See:
	http://msdn.microsoft.com/
	http://msdn.microsoft.com/scripting/
	http://msdn.microsoft.com/scripting/vbscript/doc/vsfctFormatDateTime.htm

dim x
for x = 0 to 4
	WScript.Echo x & ": " & FormatDateTime( Now(), x )
next

$format_type is one of:

0: 12/12/2000 10:46:55 PM
1: Tuesday, December 12, 2000
2: 12/12/2000
3: 10:46:55 PM
4: 22:46

Added the following to meet my specific needs:

10: Wed 11/1/2000 1:18 PM (short & clean)
11: Wed, 1 Nov 2000 13:18:00 -0000 (SMTP protocol date format)
12: 2000-11-01 13:18:00 (mysql format)
13: Perl native format / scalar localtime
14: 12/12/2000 22:46 (tight format)

Dependencies:
	none

=cut


=item FormatNumber($$$$$$)

Usage:
	my $num_str = &FormatNumber( $expression, $decimal_places, $include_leading_digit, $use_parens_for_negative, $group_digits, $euro_style );

Arguments

$expression
	Required. Expression to be formatted.

$decimal_places
	Optional. Numeric value indicating how many places to the right of the decimal are displayed.
	Note: truncates $expression to $decimal_places, does not round.

$include_leading_digit
	Optional. Boolean that indicates whether or not a leading zero is displayed for fractional values.

$use_parens_for_negative
	Optional. Boolean that indicates whether or not to place negative values within parentheses.
	Style is used for outbound formatting only; inbound parsing always uses "-" for dec (Perl's internal format)

$group_digits
	Optional. Boolean that indicates whether or not numbers are grouped using the comma.

$euro_style
	Optional. If 1, then "." separates thousands and "," separates decimal.  i.e. "800.234,24" instead of "800,234.24".
	Style is used for outbound formatting only; inbound parsing always uses "." for dec (Perl's internal format)

Prototyped to match Microsoft's FormatNumber function for vbscript/jscript, with the limitation of not knowing about default settings.

Microsoft specification at http://msdn.microsoft.com/scripting/vbscript/doc/vsfctFormatNumber.htm or from http://msdn.microsoft.com/scripting/.

Error handling:
	if $expression is not numeric, is treated as 0

Dependencies:
	none

=cut


=item GetAbsoluteAddress

Usage:

	my ($abolute_url) = &etAbsoluteAddress($link_fragment, $full_url_context);

For example, you spider "http://xav.com/foo/bar/index.html" and find a link
to "../nikken.txt". You run:

print GetAbsoluteAddress("../nikken.txt", "http://xav.com/foo/bar/index.html");
^D
http://xav.com/foo/nikken.txt

Dependencies:
	&parse_url

=cut


=item GetCrawlList($$$$$)

Usage:
	my @list = ();
	my $count = 0;

	my $age = $FORM{'StartTime'};
	if ($FORM{'DaysPast'}) {
		$age -= (86400 * $FORM{'DaysPast'});
		}

	my $err_msg = &GetCrawlList( $realm, $age, $max_list_size, \@list, \$count );

Retrieves a @list of all web pages in the '$realm' realm that are older than $age.

$count is the size that @list would be if no limits were imposed.

@list will actually contain between 0 to $max_list_size elements.  The max_list_size option is available to save memory.

Dependencies:
	package LockFile
	$const{'pending_file'}

=cut


=item LoadNodes

Usage:
	my ($err_msg, @p_Nodes) = &LoadNodes('file.xml', 'TagName', ALLOW_CACHE);

Accepts a filename argument for the structured XML file containing ad descriptions. Parses the file and returns an array of pointers to hashes, one hash for each advertisement. The hash structure looks like this:

	%AdHash = {
		'version' => '1.0',
		'weight' => '100',
		'keywords' => 'foo bar',
		'ident' => '002',
		'=' => '<center><a href=dot.com><img src=foo.gif></a></center>',
		}

Warning: LoadNodes() will replace \r and \n characters with spaces (\015|\012)

Dependencies:
	package LockFile

=cut


=item ParseRobotFile($$)

Usage:
	my @forbidden_paths = &ParseRobotFile( $RobotText, $my_user_agent );

Accepts the text of a robots.txt file, and the string name of the current HTTP user-agent. Parses through the file and returns an array of all forbidden paths that apply to the current user-agent.

Dependencies:
	&Trim
	&clean_path

=cut


=item PrintOrderedHash

Usage:
	my $err_msg = &PrintOrderedHash( \%hash, $by_value, $ascii_sort, $ascending, $date_map );

=cut


=item PrintTemplate($$$$$$)

Usage:
	&PrintTemplate( $b_return_as_string, 'tips.html', 'german', \%replace_values, \%visited, \%cache );

See "admin_help.html" for extensive documentation on this function, its limitations, its failure scenarios, etc.

Dependencies:
	&ReadFile
	&PrintTemplate
	$VERSION

=cut


=item RawTranslate

Usage:
	my $lc_ai_str = &RawTranslate($str);

Returns a lowercase, accent-stripped version on its input.  Replaces HTML-encoded characters with their ASCII equivalents.

This function is called mainly by &CompressStrip; also by &LoadRules when preparing the code for ignore words.

See http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html

Dependencies:
	$Rules{'accent sensitive'}

=cut


=item ReadFile

Usage:

	my ($err_msg, $FileText) = &ReadFile($FilePath, $depth);
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg</P>";
		}
	else {
		print "<P>File '$FilePath' contains:</P>";
		print "<P>$FileText</P>";
		}

Cheap-n-nasty file-reading function. Calls super-robust LockFile object under the hood.

=cut


=item ReadInput()

Reads CGI form input, or command-line parameters.  Initializes %$p_FORM and assigns values.

Usage:
	&ReadInput(\%FORM);

Abstracts the source of the commands (can be query string, standard input, or command-line parameters).

Dependencies:
	%Rules
	%ENV
	@ARGV
	STDIN

=cut


=item SaveLinksToFileEx($$$$)

Usage:
	my $err_msg = &SaveLinksToFileEx(
		$p_realm_data,
		$ref_crawler_results,
		$ref_spidered_links,

		$ref_links_new,
		$ref_links_visited_fresh,
		$ref_links_visited_old,
		$ref_links_error,
		);
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}

Saves all links from this crawl sessions to the pending pages file (search.pending.txt).

File format is:
	URL &url_encode(realm) number

where number is one of:
	0  => waiting to be indexed
	2  => encountered problems during index
	2+ => epoch time of the index operation

=cut


=item SelectAdEx($$)

Usage:
	my $p_Ads = &SelectAdEx( \@SearchTerms );

Returns the text for up to 4 ads, based on keywords matches with @SearchTerms.

=cut


=item SendMailEx

Usage:

	my $message = <<"EOM";

Hi there Bob!

How has life been treating you?

Regards,
Joe

EOM

	my ($err_msg, $trace) = &SendMailEx(
		'to'         => 'user@host.com',
		'to name'    => 'Bob User',     # *
		'from'       => 'me@host.com',
		'from name'  => 'Sally User',   # *
		'subject'    => 'Hi Sally',     # *
		'message'    => $message,
		'host'       => 'mail.foo.com', # *
		'port'       => 25,             # *
		);
	# * optional field

	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}
	else {
		print "<P><B>Success:</B> sent mail okay.</P>\n";
		}

	print "<P>Here is the trace:</P>\n\n";
	print "<XMP>\n$trace\n</XMP>\n";

Dependencies:
	use Socket;
	&get_mx
	&FormatDateTime

=cut


=item SetDefaults($%)

Usage:
	my $text = &SetDefaults( $html, \%params );

Takes $html, which is an HTML fragment including FORM elements, and sets all default attributes to match %params.

Requires strict format:

	<INPUT TYPE=radio NAME="name" VALUE="value">
	<INPUT TYPE=checkbox NAME="name" VALUE="value">
	<INPUT NAME="foo">
	<SELECT NAME="name".*?><OPTION VALUE="value"><OPTION VALUE="value"></SELECT>

Dependencies:
	&html_encode

=cut


=item StandardVersion

The following three functions return the HTML text for printing a single hit.  &StandardVersion() returns the normal text, &AdminVersion() returns the same text as StandardVersion with the addition of "Edit" and "Delete" buttons as well as re-routing all links through the redirector

Usage:
	my $textoutput = &StandardVersion(\@SearchTerms, %pagedata);

=cut


=item Suspend

Used for ReadWrite activity that spans multiple object lives.  Two relevant methods, Suspend and Resume.

Suspend saves the read/write depth of the related files to the $filename.exclusive_lock_request file.

Resume opens the files as would ReadWrite (does oppositive checks - the .elr and .tmp must exist).  It seeks to the appropriate places in the files before handing the handles back.


=cut


=item Trim

Usage:

	my $word = &Trim("  word  \t\n");

Strips whitespace and line breaks from the beginning and end of the argument.

=cut


=item UpdateIndex

For local realms.  Update procedure used to update all records.

Usage:
	($err_msg, $is_complete) = &UpdateIndex( $p_realm_data );

First call GetFiles2() to build a file of all the things.

Algorithm:

	(Must all be done in a single process... not restartable...)

	Use GetFiles() to create a list of all files and their lastmod times
	Build a hash of $lastmod{url} = time

	loop through all records in the existing index

		unless lastmod(url)
			delete record
			next

		delete lastmod(url)

		if (lastmod(url) == lastmod_index
			preserve record
		else
			(file = url) =~ s!^base_url!base_dir!o;
			record = build_new_record(file)
			update record
		}
	foreach (keys %lastmod)
		(file = url) =~ s!^base_url!base_dir!o;
		record = build_new_record(file)
		insert record

=cut


=item add

This method will check for the existence of index files; if they don't exist, it will attempt to create a zero-byte file.  If the creation fails, it will not load the realm.

Dependencies:
	&main::url_encode
	&main::html_encode

=cut


=item admin_link(%)

Usage:
	my $link = &admin_link(
		'Action' => 'Foo',
		'Name' => 'Value,
		);

Returns an admin URL with the passed name-value parameters.  Will URL-encode the names and values.

=cut


=item admin_main()

Usage:
	$err_msg = &admin_main();

=cut


=item check_db_config($)

Usage:
	my ($err_msg, $addr_count, $realmcount, $log_exists) = &check_db_config($verbose);
	if ($err_msg) {
		print "<P><B>Error:</B> your database is not configured properly.</P>\n";
		print $err_msg;
		}

Returns a text error message if the database is not configured properly.

=cut


=item check_filter_rules

Usage:

	my $url_to_get = 'http://www.xav.com/';
	my $document_text = '';

	my $fr = new fdse_filter_rules;

	my ($is_denied, $requires_approval, $promote_val, $err_msg) = ();

	($is_denied, $requires_approval, $promote_val, $err_msg) = $fr->check_filter_rules( $url_to_get, '', 1);

	if ($is_denied) {
		print "<P>URL '$url_to_get' is denied - $err_msg</P>";
		exit;
		}

	$document_text = get( $url_to_get );

	($is_denied, $requires_approval, $promote_val, $err_msg) = $fr->check_filter_rules( $url_to_get, $document_text, 0);

	if ($is_denied) {
		print "<P>URL '$url_to_get' is denied - $err_msg</P>";
		exit;
		}

	if ($requires_approval) {
		#queue
		}
	else {
		# add to index
		}



=cut


=item clean_path($)

Function for stripping garbage from web page paths. It will collapse "." and
".." paths, collapse stacked /// slashes, and strip pound links.

Examples:

	"/foo/../bar/index.htm" => "/bar/index.htm"
	"/test.htm#top" => "/test.htm"
	"/../foo/bar" => "/foo/bar"
	"////top//level/../no_this/./file" => "/top/no_this/file"

This is used to cleanse links discovered in user input or in web pages that
crawler visits. It is also used to clean forbidden paths in the robots.txt
files (by cleaning both the original URL and the exclusion paths with the
same function, we minimize risk of hitting an exluded path.)

Dependencies:
	&Trim

=cut


=item compress_hash

Usage:
	&compress_hash( \%pagedata );

This function is solely responsible for initiating any time fields that haven't been set yet.  Time fields are: lastindex, lastmodtime, dd, yyyy, mm

=cut


=item create_db_config($$)

Usage:
	my $err_msg = create_db_config($overwrite, $verbose);
	if ($err_msg) {
		print "<P><B>Error:</B> unable to create database configuration.</P>\n";
		print $err_msg;
		}

Attempts to create an FDSE database.  If $overwrite is true, then will overwrite existing data.

Returns an HTML multi-error message if the database cannot be created.

=cut


=item create_edit_rule()

Usage:
	&create_edit_rule();

Presents the HTML form for creating or editing a Filter Rule.  Handles submission of that form as well.

=cut


=item create_if_needed

This method/property controls whether 0-byte files will be created if none exist to read from.

=cut


=item create_sql_log

Usage:
	my ($err_msg) = &create_sql_log();

Creates a SQL table in the database defined, that is used to store the terms searched by visitors.

Dependencies:
	&db_exec

Author: Ian Dobson

=cut


=item delete_filter_rule($)

Deletes the filter rule '$name' from the internal array, and then saves the filter rules to disk.

Usage:
	my $err_msg = $FR->delete_filter_rule( $name );
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}

=cut


=item format_term_ex

Usage:
	my ($type, $is_attrib_search, $str_pattern, $sql_clause) = &format_term_ex($user_entered_term, $default_type);

Returns:
	$type of 0 == ignored, 1 == forbidden, 2 == optional, 3 == required

	$is_attrib_search is 1 iff the term is like "title:foo" or "link:xav.com".

	$str_pattern is the pattern to put against the Record to test for existence

	$sql_clause is suitable for insertion in "SELECT * FROM $Rules{'sql: table name: addresses'} WHERE ($sql_clause) AND ($sql_clause)"
		examples: text LIKE '%foo%' or ut LIKE '%my phrase%'



=cut


=item frwrite()

Saves the filter rules to their file.

Usage:
	my ($err_msg) = $FR->frwrite();
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}


=cut


=item get_dbh($)

Creates an open database connection using the byref parameter.  Returns an error string on failure.

Usage:
	my $err_msg = &get_dbh( \$dbh );
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}

=cut


=item get_default_name($$)

Usage:
	my ($defname, $deffile) = $realms->get_default_name( $base_url );

=cut


=item get_mx($)

Usage:
	my ($mailhost) = &get_mx( $hostname );

Accepts a hostname or email address, and returns it's associated SMTP server.  Depends on a call to the 'nslookup' tool, which must exist and be in the path.

Sadly, this will not work on Win9x machines or Mac's (because they don't ship with nslookup).  It will work on WinNT, Win2000, Unix/Linux.

If there is an error, returns an empty string.

Depedencies:
	`nslookup`

=cut


=item get_open_realm()

Usage:
	my ($err_msg, $p_realm_data) = $realms->get_open_realm()
	}

Returns a realm object for the first open-style realm (type == 1).  If no open realms are defined, will create one and return a pointer to it, or an error regarding the failure to create a realm.

=cut


=item get_web_folder($)

Usage:
	my $url = &get_web_folder($url);

Takes a URL and reduces it to the folder descriptor:

http://www.xav.com => http://www.xav.com/
http://www.xav.com/~bob => http://www.xav.com/~bob/
http://www.xav.com/~bob/index.html => http://www.xav.com/~bob/

=cut


=item get_website_realm($)

Usage:
	my ($err_msg, $p_realm_data) = $realms->get_website_realm( $url )

Returns a realm object for the first website-style realm with base_url that matches to $url.

If no such website-realms exist, it will try to create one.  If it fails, an error message will be returned.

=cut


=item hashref

Provides quick access to a hash containing all the information about a realm.

Usage:
	my ($err_msg, $p_realm_data) = $realms->hashref( 'foo' );
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}

Dependencies:
	&main::url_encode
	&main::html_encode

=cut


=item html_encode($)

Usage:
	my $html_str = &html_encode($str);

Formats string consistent with embedding in an HTML document.  Escapes the
"><& characters.

=cut


=item html_select_ex($$$$)

Usage:
	my ($count, $html) = $realms->html_select_ex( $attrib, $default, $class, $width1 );

=cut


=item leadpad($$$)

Usage:
	my $buffer = &leadpad( "foo", "0", 10 );
	returns "0000000foo"

=cut


=item list_filter_rules

my @rules = $indexrules->list_filter_rules()

foreach $p_rule (@rules) {
	my %rule = %$p_rule;
	$rule{'name'}
	$rule{'action'}
	$rule{'occurences'}
	$rule{'promote_val'}
	my $p_string = $rule{'p_string'};
	foreach (@$p_string) {

		}

=cut


=item listrealms

Usage:
	my @realms = $realms->listrealms('all');

Returns an array of references to all realms which match the attribute parameter.

=cut


=item load()

Usage:
	my $realms = new fdse_realms;
	my $err_msg = $realms->load();
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}

Dependencies:
	&main::check_db_config
	&main::Trim
	&main::get_dbh
	package LockFile

=cut


=item load_files

Usage:
	my $err_msg = &load_files( $data_files_dir );

This function attempts to load all the script-specific data from files.  Sequence:

	require's common.pl
	uses common.pl to call &ReadInput to process user commands

	based on users commands, may require common_parse_page.pl and/or common_admin.pl

	changes directory to data folder
	loads strings
	loads realms
	loads rules

Failures with any of these actions are considered fatal errors, and the return values are set appropriately.

=cut


=item load_pics_descriptions

			my (@pics_codes, @pics_names, @pics_values) = ();
			$err_msg = &load_pics_descriptions( 'RASCi', \@pics_codes, \@pics_names, \@pics_values );
			next Err if ($err_msg);


Usage:
	$err_msg = &load_pics_descriptions( 'RASCi', \%pics_values, \%pics_names );

=cut


=item log_search

Usage:
	my $err_msg = &log_search( $realm, $terms, $rank, $documents_found, $documents_searched );

Where:
	$realm == the realm name; 'All' for cases where the realm hasn't been specified
	$terms * == the literal string that the user typed in.
	$rank == the starting number in displaying hits.  will be 1 for first search, 11 for "Next", 21 for "Next" after that, etc.
			used to calculate the depth that visitors go in searching for data
	$documents_found == integer; total documents matching $terms.  in theory $ranks <= $documents_found

* when writing to the log, any commas or line breaks will be stripped from the Terms. Also, they will be &html_encode'd so "<" => "&lt;" etc.

The function internally looks up the visitor IP/hostname and the current time.

The $err_msg is typically discarded (no reason to frighten visitors)

Dependencies:
	file: $const{'log_file'}
	$const{'file_mask'}
	$Rules{'sql: logfile'}
	&html_encode
	&db_exec
	$ENV{'REMOTE_HOST'}
	$ENV{'REMOTE_ADDR'}

=cut


=item migrate_log

Usage:
	&migrate_log( 'search.log.txt' );

Migrates a text log from the version before 2.0.0.0029 to the newer version.

Handles cases where the text logfile contains a mix of old and new records.

Writes status and error handling text to stdout.

The entire function is wrapped in an eval statement to protect against Time::Local not being available, or Time::Local trying to kill the process.


Dependencies:
	package LockFile

=cut


=item pagedata_from_file

Usage:
	my $err_msg = &pagedata_from_file( $file, $URL, \%pagedata, \$fr );

$fr is an initialized filter rules object (passed by reference between calls to pagedata_from_file for efficiency.

Dependencies:
	&ReadFile
	$Rules{'max characters: file'}
	&process_text

=cut


=item parse_html_ex

Parses the HTML text of a web page, and returns relevant fields.  Based mostly on byref parameters for speed.
No error handling.

Usage:

	# Initialize inputs:
	my ($raw_text, $URL, $b_SaveLinks, @links) = ('foo', 'http://foo/', 1);

	# Declare outputs:
	my %pagedata = ();

	# Call
	&parse_html_ex($raw_text, $URL, $b_SaveLinks, \@links, \%pagedata);

Dependencies:
	%Rules
	%FORM
	%const
	&Assert
	&GetAbsoluteAddress
	&parse_meta_header
	&Capitalize
	&Trim

=cut


=item parse_pics_label

Usage:
	my ($is_denied, $require_approval, $err_msg) = &parse_pics_label( $text );

Determines whether there is a PICS meta tag in the HTML $text supplied.  If there is, and if this script is concerned with PICS (as evidenced by the appropriate %Rules), then it parses the tag and compares values to the %Rules maximums.

If it finds that the document will $require_approval, it notes this and continues parsing.  If it finds that text document $is_denied, it exits immediately.  The $err_msg contains information about the final rule violated.

Dependencies:
	%main::Rules
	&main::parse_meta_header

=cut


=item parse_url

Usage:
	my ($clean_url, $host, $port, $path, $is_valid) = &parse_url( $url );

Dependencies:
	&clean_path

=cut


=item present_queued_pages($)

Usage:
	&present_queued_pages( $realm );

Displays a list of all pages waiting for approval.

=cut


=item process_text

Usage:
	my ($is_error, $err_msg, $allow_follow, $is_redirect, $full_redir_url) = &process_text( \$text, $url );

Dependencies:
	$Rules{'crawler: rogue'}
	$Rules{'minimum page size'}
	$Rules{'crawler: minimum whitespace'}
	&parse_meta_header
	&GetAbsoluteAddress

=cut


=item query_realm($$$$$)

Usage:
	my ($err_msg) = &query_realm( $realm, $url_pattern, $start_pos, $max_results, \%crawler_results );
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}

=cut


=item read_tokens($$)

Reads the hash %Tokens out of file $auth_file.

Usage:
	my ($err_msg) = &read_tokens( $auth_file, \%Tokens );
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}

=cut


=item realm_count

Usage:
	my $int_realms = $realms->realm_count('all');
	my $int_bound_realms = $realms->realm_count('has_base_url');

Returns an integer for the number of realms that match the attribute passed as an argument.  If not attribute is passed, returns the total number of realms.

=cut


=item realm_interact($$)

Usage:
	my %code = ();
	&realm_interact( $p_realm_data, $Rules{'sql: enable'}, \%code );

	Assumes
	my ($i_url, $i_lastmodt, $i_record, %pagedata) = ()

	use $i_line to seek for a resume operation
	$i_line is also incremented with the record count, during operations, for use suspend/resume operations


	standard Err block handling

Returns

	$code{'init'}
	$code{'resume'}

	$code{'suspend'}
	$code{'abort'}
	$code{'finish'}

	$code{'get_next'} assigns to ($i_url, $i_lastmodt, $i_record)

	$code{'update'} writes based on $i_url / %pagedata
	$code{'insert'} writes based on %pagedata
	$code{'preserve'} ($i_url / $i_record)
	$code{'delete'} ($i_url)

=cut


=item rebuild_realm($)

Usage:
	my ($err_msg, $is_complete) = &rebuild_realm( $realm );

Attempts to rebuild the realm.  Does The Right Thing based on the type of realm we're dealing with.

Dependencies:
	$realms
	package LockFile
	&BuildIndex
	&url_encode
	&s_AddURL

=cut


=item s_AddURL($$@)

Usage:
	&s_AddURL($b_IsAnonAdd, $Realm, @AddressesToIndex);

This is the main function for adding web pages to the realms, both for administrators and anonymous visitors.  Internally handles the crawling, error handling, HTML parsing, and storage.

Dependencies:
	package Crawler;
	package fdse_filter_rules
	&update_realm
	&SaveLinksToFileEx

=cut


=item s_CrawlEntireSite($)

Usage:
	my ($err_msg, $is_complete) = &s_CrawlEntireSite( $realm );

=cut


=item save_realm_data

Usage:
	my $err_msg = $realms->save_realm_data();

Takes the current $realms object and persists it to the associated file.  Returns the error/success of the operation.

Since save_realm_data is typically called whenever state has changed, this method also flushes all caches.

Dependencies:
	&get_dbh
	&check_db_config
	&Commas
	package LockFile
	file: searchdata/search.realms.txt
	file: searchdata/total_pages_indexed.txt

=cut


=item setpagecount

Usage:
	$name = "My Realm";
	$n_pages = 1000;
	print "<P>Now there are $n_pages pages in realm '$name'!</P>\n";
	my ($err_msg) = $realms->setpagecount($name, $n_pages);
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}

=cut


=item str_jumptext($$$$$)

Usage:
	my $jumptext = &str_jumptext( $current_pos, $units_per_page, $maximum, $url, $b_is_exact_count );

Creates the HTML text for a "<- Previous 1 2 3 4 5 Next ->" block.

Everything is 1-based.

=cut


=item text_record_from_hash

Creates a textfile record out of the constituent fields.

Usage:
	my ($err_msg, $text_record) = &text_record_from_hash(\%pagedata);

=cut


=item ui_AdminPage()

Usage:
	&ui_AdminPage();

Default view into the search engine.

=cut


=item ui_DeleteRecord

Usage:
	&ui_DeleteRecord();

DeleteRecord provides an interactive HTML interface for record deletions.  It allows:
	record deletion based on Realm and URL(s)
	querying for multiple records based on URL patterns
It is primarily called from the AdminVersion output. It can also be called by itself, for pattern-deletes.

if $realm and $query_pattern

	DeleteRecord will search $realm for all records which match $query_pattern.
	They are shown to the user, who can then choose whether to delete all those records or not

else if $realm and @urls_to_delete

	DeleteRecord will try to delete all the records by calling update_realm

else

	DeleteRecord will offer a delete interface - browse realm or select realm, type in URL to delete


In $query_pattern, ".*" will be mapped to "%" for SQL queries.

Because the @url_patterns may be handed off to SQL, only .* can be used safely.  .* will be mapped to % for SQL queries.  However, other Perl regular expressions will be passed through, so enhanced Perl expressions (or SQL expressions) can still be leveraged if the user knows about the underlying data storage system.  Code-executing regular expressions using ?{} will be stripped for security.

Dependencies:
	&update_realm
	%FORM
	&html_encode

=cut


=item ui_EditRecord()

Usage:
	&ui_EditRecord();

Handles entire process of editing URL records.

=cut


=item ui_FilterRules

This function handles the admin user interface for managing filter rules.

Usage:
	&ui_FilterRules();

Error handling is done by printing HTML to the end user.

=cut


=item ui_GeneralRules($$$$)

Usage:
	my $is_edit_mode = &ui_GeneralRules( $action_name, \@setting_names );

Displays the settings from the %Rules array, and the descriptions for each settings.  Allows validated edits for each setting based on datatype.

In general, the %Rules architecture should be replaced with an array.  Using an English-keyed hash is hard to translate, and also uses more memory.

=cut


=item ui_ManageAds

This prints the admin view HTML for controlling advertisements.  It also handles the action of the forms on this UI, including changing positions, defining new ads, and reset usage data.

=cut


=item ui_ManageRealms()

Usage:
	&ui_ManageRealms();

Presents the HTML form used to define a new realm, or to customize an existing realm.

=cut


=item ui_PersonalSettings()

Usage:
	&ui_PersonalSettings();

=cut


=item ui_Rebuild()

Usage:
	&ui_Rebuild();

Attempts to rebuild the given realm.

=cut


=item ui_UserInterface();

Usage:
	&ui_UserInterface();

Handles entire process of editing user-interface specific settings.

=cut


=item ui_ViewStats()

Usage:
	&ui_ViewStats();

Provides full user interface for viewing search log.

All error handling is done via HTML presented to the user; no errors are returned.

Dependencies:
	%FORM
	&get_dbh
	$const{'log_file'}

=cut


=item update_database

Dependencies:
	&get_dbh

Usage:
	my ($err_msg, $entry_count, $duplicates) = &update_database( $realm, \%crawler_results );

Error handling:
	Returns $err_msg if there's a problem, suitable for inclusion in <P><B>Error:</B> $err_msg.</P>

=cut


=item update_file

Usage:
	my ($err_msg, $entry_count, $duplicates) = &update_file( $realm, \%crawler_results );
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}

Dependencies:
	package LockFile
	$realms
	&Assert
	%Rules
	%const

=cut


=item update_realm($$)

Incorporates the results of a crawl - stored in the %crawler_results hash - into the underlying storage container for $realm.  Includes adding new records, updating existing records, and deleting expired records.

Usage:
	my ($err_msg, $total_records, $new_records, $updated_records, $deleted_records) = update_realm( $realm, \%crawler_results );
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}
	else {
		print "<P>There are now $total_records web pages in the '$realm' realm - $new_records records created; $updated_records updated; $deleted_records removed.</P>\n";
		}

Dependencies:
	%Rules
	update_database
	update_file


=cut


=item url_encode

Usage:
	my $str_url = &url_encode($str);

Formats strings consistent with RFC 1945 by rewriting metacharacters in their
%HH format.

=cut


=item use_database

Usage:
	$realms->use_database( 1 );

Sets the $self->{'use_db'} scalar.

Useful for data migration.

Example:
	# Loads all realm data from file; saves all data to database:

	$realms->use_database(0); # now using file
	$realms->load();

	# all realm data is currently in memory

	$realms->use_database(1); # now using database
	$realms->save_realm_data(); # just wrote all the data to the database

=cut


=item validate

This function takes all the parameters that could make up a filter rule, and determines whether they are valid or not.  Returns a text error message if the rule would not be valid.

Usage:
	my ($err_msg) = $FR->validate($enabled, $name, $action, $promote_val, $analyze, $mode, $occurrences, @substr);
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}

=cut


=item write_tokens($$)

Saves the %Tokens hash to file $auth_file.

Usage:
	my ($err_msg) = &write_tokens( $auth_file, \%Tokens );
	if ($err_msg) {
		print "<P><B>Error:</B> $err_msg.</P>\n";
		}

=cut

