Loading…
ApacheCon EU 2014 has ended
Register Now for ApacheCon Europe 2014 - November 17-21 in Budapest, Hungary. 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Lucene / Solr [clear filter]
Tuesday, November 18
 

10:00am

High Performance Solr - Shalin Shekhar Mangar, LucidWorks Inc.
What makes an Apache Solr installation high performance? Learn about what's keeping that CPU hot, memory tight, disk screaming and network busy on your Solr installation. Optimize CPU usage, reduce memory and disk contention, unclog that network and learn about hidden gotchas of query and filter performance, DocValues, transaction logs etc.

Although, numerous Apache Solr performance tuning tips are available on the world-wide-web but they are distributed between the official Apache Solr reference guide, Apache Lucene/Solr javadocs, code comments, Jira comments, various books, the mailing-list and many blogs. In many cases, such advice is unsubstantiated with numbers or not annotated with the trade-offs.

Shalin will back each piece of advice with statistics and benchmarks as well as inform you about the trade-offs made so you can make more informed decisions.

Speakers
avatar for Shalin Shekhar Mangar

Shalin Shekhar Mangar

Senior Solr Consultant, Lucidworks
I am a committer on Apache Lucene/Solr since 2008 as well as a member of the Lucene/Solr project management committee. I currently work at Lucidworks Inc. on Apache Solr and Lucidworks Search mostly on the SolrCloud side of things. In the past, I've worked at AOL for five years on... Read More →


Tuesday November 18, 2014 10:00am - 10:50am
Elod/Ond

1:30pm

Native Code And Off-Heap Data-Structures For Solr - Yonik Seeley, Heliosearch
Off-heap data structures and native code performance improvements for Apache Solr are being developed as part of the Heliosearch project. This presentation will cover the reasons behind these features, implementation details, and performance impacts. Recently developed features will also be covered (i.e. developed after this abstract was written).

Speakers
avatar for Yonik Seeley

Yonik Seeley

Search Engineer, Cloudera
Yonik Seeley is the creator of Solr. He works at Cloudera integrating and leveraging "Big Search" technologies into their advanced platform for machine learning and analytics. Yonik was a co-founder of LucidWorks, and he holds a master's degree in computer science from Stanford U... Read More →


Tuesday November 18, 2014 1:30pm - 2:20pm
Elod/Ond

3:50pm

Multi Language Content Discovery Through Entity Driven Search - Antonio Morales, Zaizi
This presentation is about a Semantic Search Engine built on top of an Apache projects stack.
The scope of the system is to extract content from heterogeneus data sources, process and enrich it , index it and search over this content with advanced User Experience.
This result is achieved with the integration of 4 top level Apache projects : ManifoldCF, Stanbol, Marmotta and Solr.

Apache Manifold is used to access different data sources and extract the content from them: it's the engine that moves the main indexer core of the system. The document extracted are processed in a pipe-line manner, here happens the integration with Apache Stanbol and Apache Marmotta that allows the semantic enrichment of the content according to any Linked Data assets.
The last ring of the chain is custom Search API built on top of Apache Solr that allows advanced search features to improve user experience

Speakers
AM

Antonio Morales

R&D Senior Engineer, Zaizi
Senior Software Engineer working at the R&D division of Zaizi. Computer Engineer and M.Sc Software Engineer and Technology with broad experience in Analysis, Design, Development and Integration of enterprise web, mobile and cloud applications. He is one of the most security expert... Read More →


Tuesday November 18, 2014 3:50pm - 4:40pm
Elod/Ond
 
Wednesday, November 19
 

9:30am

Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenko, STYLIGHT
Nowadays there are plenty of solution to build a search subsystem. The question is how to keep such a system flexible and easy to react on data-driven decisions, constantly improve the quality. In talk are presented lessons learned from our experience of building lean ranking infrastructure, that could be used with data-driven approach in product development. With slides we walk through the process of scaling out the search system from a couple to 13 countries around the world, but keeping flexibility, that allows to test hypothesis on different levels and perform a/b testing in different dimensions.

Speakers
avatar for Sergii Khomenko

Sergii Khomenko

Data Scientist, STYLIGHT
Data scientist at one of the biggest fashion communities, STYLIGHT. Data analysis and visualisation hobbyist, working on problems not only in working time but in free time for fun and personal data visualisations. | | Speaker at different conferences: Berlin Buzzwords 2014, ApacheCon... Read More →


Wednesday November 19, 2014 9:30am - 10:20am
Elod/Ond

10:40am

“Your Search Doesn’t Work.” - How to Find Out Whether or Not the Search Box You Offer Users is Helping - Isabel Drost-Fromm, Elasticsearch
Web sites without search functionality are unimaginable today - you search for comments and code on github, you look for books in your favourite webshop, you use the search box of your favourite blog to find articles.

When offering your search for your own application - how do you know that your search actually provides a benefit to the user instead of causing lots of frustration over results not found? Only checking that the favourite book about witches of your child is ranked top of all children books clearly doesn’t help.

This talk will walk you through the options of determining search quality - from purely offline metrics that work even before deploying version 1.0 to production to online A/B testing to check continuous improvement. I will highlight some Lucene and Elasticsearch features that can tremendously help you deploy your own search quality checks.

Speakers
avatar for Isabel Drost-Fromm

Isabel Drost-Fromm

Open Source Strategist, Europace AG
Isabel Drost-Fromm is Open Source Strategist at Europace AG Germany. She's a member of the Apache Software Foundation, co-founder of Apache Mahout and mentored several incubating projects. Isabel is interested in all things FOSS, search and text mining with a decent machine learning... Read More →


Wednesday November 19, 2014 10:40am - 11:30am
Elod/Ond

11:40am

Flexible Search In Apache Jackrabbit Oak - Tommaso Teofili, Adobe Systems
Apache Jackrabbit Oak is the next generation content repository based on the JCR specification, designed to be scalable for high read/write throughput, huge number of nodes in the repository and highly concurrent operations. In this presentation Tommaso Teofili will describe the flexible and pluggable search architecture of Oak which allows to define multiple indices to address specific types of queries with specific constraints for performant indexing and searching. A deeper focus on the Apache Lucene and Apache Solr based index implementations will be given, showing some insights on how they have been integrated to address hierarchical content search together with some performance benchmarks and real life use cases.

Speakers
avatar for Tommaso Teofili

Tommaso Teofili

Software Engineer, Adobe Systems
Open source enthusiast and member at the Apache Software Foundation, working as a software engineer for Adobe Systems on data replication and search. Passionate about natural language processing and machine learning.


Wednesday November 19, 2014 11:40am - 12:30pm
Elod/Ond