Create and populate a field when indexing

2007-11-09       - By KR

Reply:     1     2     3     4  

Grant Ingersoll-6 (See wrote:
> When you are indexing the file and adding the Document, you will need  
> to parse out your filename per your regular expression, and then  
> create the appropriate field:
> Document doc = new Document()
> String cat = getCategoryFromFileName(inputFileName)
> doc.add(new Field("category", cat, ...)
> //do the rest of your adds
> Just locate where in the demo the Document add is taking place (I  
> forget the exact spot) and then add in the appropriate stuff from  
> above.  Obviously, you need to implement the method I stubbed called  
> getCategoryFromFileName.
> HTH,
> Grant

Thanks, Grant. That was just the hint I needed.

I found that the fields are populated in HTMLDocument.

I added:

doc.add(new Field("category", "test", Field.Store.YES,

and then used Luke to verify that this field had been added. It had.

Now I am trying to get a quick-and-dirty way of setting the field based on
the filename, but I'm running into problems that I don't really understand
well enough to fix quickly.

I have only very limited experience of Java programming, so I might be using
the wrong terms, but I think the problem is variable scope. I get a
compilation error: cannot find symbol
symbol  : variable url
location: class org.apache.lucene.demo.HTMLDocument
       if (url.indexOf("-ov-") != -1) {

I thought I'd be able to use a simple mechanism based on indexOf() to check
the existence of a short sequence of characters within the filename. For
example, "-sys-". I know that this sequence, if it exists anywhere in the
full path must be in the filename.

So I put in a series of if statements like this:

  if (url.indexOf("-sys-") != -1) {
    string category = "system";

then right at the end:
doc.add(new Field("category", category, Field.Store.YES,

Am I right in thinking that the variable url is undefined at this point in
the code? It certainly seems to be defined earlier on in the file:

 public static String uid2url(String uid) {
   String url = uid.replace('\u0000', '/');    // replace nulls with slashes
   return url.substring(0, url.lastIndexOf('/')); // remove date from end

Is there some way for me to perhaps chop down to the filename here, and make
that available later in the code?

