Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/search/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hibernate搜索索引未完成的文档_Hibernate_Search_Batch File_Indexing_Hibernate Search - Fatal编程技术网

Hibernate搜索索引未完成的文档

Hibernate搜索索引未完成的文档,hibernate,search,batch-file,indexing,hibernate-search,Hibernate,Search,Batch File,Indexing,Hibernate Search,我在批量编制数据索引时遇到问题。 我想为一个文章列表编制索引,并在需要获取信息的成员上添加一些@IndexedEmbeddedArticle从另外两个bean获取附加信息:Page和Articlefulltext 由于Hibernate搜索注释,批处理正在正确更新数据库,并将新的文档添加到我的Lucene索引中。但是添加的文档有不完整的字段。Hibernate搜索似乎没有看到所有的注释 因此,感谢Luke,当我查看生成的lucene索引时,我有一些关于Article和Page对象的字段,但没有关

我在批量编制数据索引时遇到问题。 我想为一个
文章
列表编制索引,并在需要获取信息的成员上添加一些
@IndexedEmbedded
Article
从另外两个bean获取附加信息:
Page
Articlefulltext

由于Hibernate搜索注释,批处理正在正确更新数据库,并将新的
文档添加到我的Lucene索引中。但是添加的文档有不完整的字段。Hibernate搜索似乎没有看到所有的注释

因此,感谢Luke,当我查看生成的lucene索引时,我有一些关于Article和Page对象的字段,但没有关于ArticleFulltext的字段,但我的数据库中有正确的数据,这意味着persist()操作已正确完成

我真的需要一些帮助,因为我看不出我的页面和文章全文之间有什么区别

奇怪的是,如果我使用
MassIndexer
,它会正确地将Article+Page+Article全文数据添加到lucene索引中。但我不想每次做一个大的更新时都重建数百万个文档索引

对于hibernate搜索和lucene,我将log4j日志记录级别设置为debug。他们没有给我那么多信息

这是我的bean代码和批处理代码

提前感谢您的帮助

Article.java:

@Entity
@Table(name = "article", catalog = "test")
@Indexed(index="articleText")
@Analyzer(impl = FrenchAnalyzer.class)
public class Article implements java.io.Serializable {

    @Id
    @GeneratedValue(strategy = IDENTITY)
    @Column(name = "id", unique = true, nullable = false)
    @DocumentId        
    private Integer id;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "firstpageid", nullable = false)
    @IndexedEmbedded
    private Page page;

    @Column(name = "heading", length = 300)
    @Field(name= "title", index = Index.YES, store = Store.YES)
    @Boost(2.5f)
    private String heading;

    @Column(name = "subheading", length = 300)
    private String subheading;

    @OneToOne(fetch = FetchType.LAZY, mappedBy = "article") 
    @IndexedEmbedded
    private Articlefulltext articlefulltext;
    [... bean methods etc ...]
FullTextEntityManager em = null;

@Override
protected void executeInternal(JobExecutionContext arg0) throws JobExecutionException {
    ApplicationContext ap = null;
    EntityManagerFactory emf = null;
    EntityTransaction tx = null;


    try {
        ap = (ApplicationContext) arg0.getScheduler().getContext().get("applicationContext");
        emf = (EntityManagerFactory) ap.getBean("entityManagerFactory", EntityManagerFactory.class);
        em = Search.getFullTextEntityManager(emf.createEntityManager());
        tx = em.getTransaction();


        tx.begin();
                // [... em.persist() some things which aren't lucene related, so i skip them ....]
        for(File xmlFile : xmlList){
            Reel reel = new Reel(title, reelpath);
            em.persist(reel);
                    Article article = new Article();
                        // [... set Article fields, so i skip them ....]
                    Articlefulltext ft = new Articlefulltext();
                        // [... set Articlefulltext fields, so i skip them ....]
                    ft.setArticle(article);
                    ft.setFulltextcontents(bufferBlock.toString());
                    em.persist(ft); // i persist ft before article because of FK issues
                    em.persist(article); // there, the Annotation update Lucene index, but there's not updating fultextContent (see my first post)
            if ( nbFileDone % 50 == 0 ) {
                //flush a batch of inserts and release memory:
                em.flush();
                em.clear();
            }
        }
            tx.commit();
    }
    catch(Exception e){
        tx.rollback();
    }
    em.close();
}
Page.java

@Entity
@Table(name = "page", catalog = "test")
public class Page implements java.io.Serializable {

    private Integer id;
    @IndexedEmbedded
    private Issue issue;
    @ContainedIn
    private Set<Article> articles = new HashSet<Article>(0);
    [... bean method ...]
@Entity
@Table(name = "articlefulltext", catalog = "test")
@Analyzer(impl = FrenchAnalyzer.class)
public class Articlefulltext implements java.io.Serializable {

    @GenericGenerator(name = "generator", strategy = "foreign", parameters = @Parameter(name = "property", value = "article"))
    @Id
    @GeneratedValue(generator = "generator")
    @Column(name = "aid", unique = true, nullable = false)
    private int aid;

    @OneToOne(fetch = FetchType.LAZY)
    @PrimaryKeyJoinColumn
    @ContainedIn
    private Article article;

    @Column(name = "fulltextcontents", nullable = false)
    @Field(store=Store.YES, index=Index.YES, analyzer = @Analyzer(impl = FrenchAnalyzer.class), bridge= @FieldBridge(impl = FulltextSplitBridge.class))
    // This Field is not add to the Resulting Document ! I put a log into FulltextSplitBridge, and it's never called during a batch process. But if I use a MassIndexer, i see that FulltextSplitBridge is called for each Articlefulltext ...
    private String fulltextcontents;
    [... bean method ...]
下面是用于更新数据库和Lucene索引的代码

批处理源代码:

@Entity
@Table(name = "article", catalog = "test")
@Indexed(index="articleText")
@Analyzer(impl = FrenchAnalyzer.class)
public class Article implements java.io.Serializable {

    @Id
    @GeneratedValue(strategy = IDENTITY)
    @Column(name = "id", unique = true, nullable = false)
    @DocumentId        
    private Integer id;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "firstpageid", nullable = false)
    @IndexedEmbedded
    private Page page;

    @Column(name = "heading", length = 300)
    @Field(name= "title", index = Index.YES, store = Store.YES)
    @Boost(2.5f)
    private String heading;

    @Column(name = "subheading", length = 300)
    private String subheading;

    @OneToOne(fetch = FetchType.LAZY, mappedBy = "article") 
    @IndexedEmbedded
    private Articlefulltext articlefulltext;
    [... bean methods etc ...]
FullTextEntityManager em = null;

@Override
protected void executeInternal(JobExecutionContext arg0) throws JobExecutionException {
    ApplicationContext ap = null;
    EntityManagerFactory emf = null;
    EntityTransaction tx = null;


    try {
        ap = (ApplicationContext) arg0.getScheduler().getContext().get("applicationContext");
        emf = (EntityManagerFactory) ap.getBean("entityManagerFactory", EntityManagerFactory.class);
        em = Search.getFullTextEntityManager(emf.createEntityManager());
        tx = em.getTransaction();


        tx.begin();
                // [... em.persist() some things which aren't lucene related, so i skip them ....]
        for(File xmlFile : xmlList){
            Reel reel = new Reel(title, reelpath);
            em.persist(reel);
                    Article article = new Article();
                        // [... set Article fields, so i skip them ....]
                    Articlefulltext ft = new Articlefulltext();
                        // [... set Articlefulltext fields, so i skip them ....]
                    ft.setArticle(article);
                    ft.setFulltextcontents(bufferBlock.toString());
                    em.persist(ft); // i persist ft before article because of FK issues
                    em.persist(article); // there, the Annotation update Lucene index, but there's not updating fultextContent (see my first post)
            if ( nbFileDone % 50 == 0 ) {
                //flush a batch of inserts and release memory:
                em.flush();
                em.clear();
            }
        }
            tx.commit();
    }
    catch(Exception e){
        tx.rollback();
    }
    em.close();
}

嗯,你似乎并没有把双方的关系都确定下来。我可以看到ft.setArticle(article),但看不到article.setFtArticle(ft)。关系的两个方面都需要设置。在您的情况下,Articlefulltext是关系的所有者,但这意味着您不必设置双方

嗯,你说得对,非常感谢你。。。太简单了。奇怪的是,对于Page,我只创建了关系的一面,但它工作正常。这是因为它是一个多通关系吗?这取决于我们必须遍历关系链接的哪个方向来更新索引。很可能页面的内容是在主方向(仅)填写的,因此是偶然的。“偶然的”,所以“最佳”实践是定义双方的所有关系?很高兴知道,它在任何地方都能做到。谢谢你们两个!您不必将所有关系都定义为双向关系,但如果您定义为双向关系,则必须同时更新双方,否则会混淆哪个实例提供了正确的信息。搜索可能需要将关系定义为双向关系,以允许一个位置同时定义IndexedEmbedded和ContainedIn