midcom_services_indexer_documentThis class encaspulates a single indexer document. It is used for both indexing and retrieval.
A document consists of a number of fields, each field has different properties when handled by the indexer (exact bahvoir depends, as always, on the indexer backend in use). On retrieval, this field information is lost, all fields being of the same type (naturally). The core indexer backend supports these field types:
A number of predefined fields are available using member fields. These fields are all meta-fields. See their individual documentation for details. All fields are mandatory unless mentioned otherwise explicitly and, as always, assumed to be in the local charset.
Remember, that both date and unstored fields are not available on retrieval. For the core fields, all timestamps are stored twice therefore, once as searchable field, and once as readable timestamp.
The class will automatically pass all data to the i18n charset conversion functions, thus you work using your site's charset like usual. UTF-8 conversion is done implicitly.
Located in /midcom/services/indexer/document.php (line 48)
| Class | Description |
|---|---|
midcom_services_indexer_document_attachment
|
This is a class geared at indexing attachments. It requires you to "assingn" the attachment to a topic, which is used as TOPIC_URL for permission purposes. In addition you may set another MidgardObject as source object, it's GUID is stored in the __SOURCE field of the index. |
midcom_services_indexer_document_midcom
|
This is a base class which is targeted at MidCOM content object indexing. It should be used whenever MidCOM documents are indexed, either directly or as a base class. |
string
$component
= '' (line 107)
The name of the component responsible for the document. May be empty for non-midgard resources.
This field is mandatory.
string
$content
= '' (line 181)
The content of the document
This is mandatory.
This field is empty on documents retrieved from the index.
int
$created
= 0 (line 125)
The time of document creation, this is an UNIX timestamp.
This field is mandatory.
MidgardPerson
$creator
= null (line 152)
The MidgardPerson who created the object.
This is optional.
string
$document_url
= '' (line 116)
The fully qualified URL to the document, this should be a PermaLink.
This field is mandatory.
int
$edited
= 0 (line 134)
The time of the last document modification, this is an UNIX timestamp.
This field is mandatory.
MidgardPerson
$editor
= null (line 161)
The MidgardPerson who modified the object the last time.
This is optional.
int
$indexed
= 0 (line 143)
The timestamp of indexing.
This field is added automatically and to be considered read-only.
string
$RI
= '' (line 87)
The Resource Identifier of this document. Must be UTF-8 on assignement already.
This field is mandatory.
double
$score
= 0.0 (line 75)
This is the score of this document. Only populated on resultset documents, of course.
string
$security
= 'default' (line 259)
Security mechainsm used to determine the availability of a search result.
Can be one of:
string
$source
= '' (line 210)
An additional tag indicating the source of the document for use by the component doing the indexing. This value is not indexed and should not be used by anybody except the component doing the indexing.
This is optional.
GUID
$topic_guid
= '' (line 97)
The GUID of the topic the document is assigned to. May be empty for non-midgard resources.
This field is mandatory.
string.
$topic_url
= '' (line 225)
The full path to the topic that houses the document. For external resources, this should be either a MidCOM topic, to which this resource is accociated or some "directory" after which you could filter. You may also leave it empty prohibiting it to appear on any topic-specific search.
The value should be fully qualified, as returned by MIDCOM_NAV_FULLURL, including a trailing slahs, f.x. https://host/path/to/topic/
This is optional.
string
$type
= '' (line 239)
The type of the document, set by subclasses and added to the index automatically.
The type *must* reflect the original type hierarchy. It is to be set using the $this->_set_type call after initializing the base class.
Array
$_fields
= array() (line 59)
An acciociative array containing all fields of the current document.
Each field is indexed by its name (a string). The value is another array containing the fields "name", type" and "content".
midcom_service_i18n
$_i18n
= null (line 67)
A reference to the i18n service, used for charset conversion.
Initialize the object, nothing fancy here.
Add a date field. A timestamp is expected, which is automatically converted to a suiteable ISO timestamp before storage.
Direct specification of the ISO timestamp is not yet possible due to lacking validation outside the timestamp range.
If a field of the same name is already present, it is overwritten silently.
This is a small helper which will create a normal date field and a unindexed _TS-postfixed timestamp field at the same time.
This is useful because the date fields are not in a readable format, it can't even be determined that they were a date in the first place. so the _TS field is quite useful if you need the orginal value for the timestamp.
Add a keyword field.
Add a search result field, this should normally not be done manually, the indexer will call this function when creating a document out of a search result.
Add a text field.
Add a unindexed field.
Add a unstored field.
Returns a textual representation of the specified datamanager field.
Actual behavoir is dependent on the datatype. Text fields are accessed directly, for other fields, the CSV representation is used.
Text fields run through the html2text converter of the document base class.
Attention: This function accesses originally private datamanager members. It is the only possible way to access the CSV interface of individual fields.
Debugging helper, which will dump the documents contents to the log file using the indicated log level. It will check the log-level explicitly for performance reasons.
Note: print_r'ing the entire document might not be an option, as subclasses contain reference to non-dumpable object like the datamanager.
This function should be called after retrieving a document from the index. It will populate all relevant members with the according values.
Returns the contents of the field name or false on failure.
Returns the complete internal field record, including type and UTF-8 encoded content.
This should normally not be used from the outside, it is geared towards the indexer backends, which need the full field information on indexing.
This is a small helper that converts HTML to plain text (relativly simple):
Basically, JavaScript blocks and HTML Tags are stripped, and all HTML Entities are converted to their native equivalents.
Don't replace with an empty string but with a space, so that constructs like <li>torben</li><li>nehmer</li> are recognized correctly. While this might result in double-spaces between words, this is better then loosing the word boundaries entirely.
Checks wether the given document is an instance of given document type.
This is equivalent to the is_a object hirarchy check, except that it works with MidCOM documents.
This will translate all member variables into appropriate field records, the backend should call this immediately before indexing.
This call will automatically populate indexed with time() and author with the name of the creator (if set).
Remove a field from the list. Nonexistent fields are ignored silently.
Internal helper which actually stores a field.
Sets the type of the object, reflecting the inheritance hierarchy.
Documentation generated on Mon, 21 Nov 2005 18:14:53 +0100 by phpDocumentor 1.3.0RC3