Google Query Language - University of Alabama at

Google Query Language
-- a DSL for Advanced Google Searching
Xiaoqing Wu
Advisor: Dr. Barrett R. Bryant
Department of Computer and Information Science
03/04/2005
Background
• PhD research: Compiler Development
Environment (CDE)
– Compiler, interpreter, and integrated development
environment automatic generation
– Several Domain-Specific Languages have been
developed on top of CDE
• GQL: an application based on CDE
– Internet -- Database
– Google --Database Management System (DBMS)
– GQL -- Structured Query Language (SQL)
Google:
more than keyword searching
• Language preference
• File format, date, occurrences, domain
• Image, forum, shopping search
Query customization in Google
• Filling forms
• Writing meta-tokens directly
– allintext: Xiaoqing Wu filetype:pdf
Why GQL (I)?
• Forms are not flexible
–
–
–
–
Fixed
Can’t be saved and reused
Filling multiple forms is time-consuming
Mouse operation is slower than keyboard operation
Why GQL (II)?
• Meta-tokens are not designed for end-users
–
–
–
–
Not user friendly
No syntax provided
No type-checking
Ambiguous
keyword1 keyword3 OR keyword4 "keyword2"
GQL: A well-formed DSL
• User friendly grammar
– Natural, SQL-like syntax rules, easy to follow
– No ambiguity
• IDE support
– Automatic syntax and type checking
• Program based query
– Query could be saved and reused
– Search from old query
• Flexible: numerous forms!
No more forms!
search {key}*
from file
where {constraint}*
Demo
GQL Syntax Grammar
[1] query ::= SEARCH|IMAGE o_keylist occurrence constraints withinstmt
[2] o_keylist ::= keylist |
[3] keylist ::= key | keylist COMMA key
[4] key ::= word | noword | orwordlist | exactword
[5] word ::= STRING
[6] noword ::= NOT word
[7] orwordlist ::= orword OR orword | orwordlist OR orword
[8] orword ::= word | exactword
[9] exactword ::= QSTRING
[10] occurrence ::= FROM OCCVALUE |
[11] constraints ::= WHERE constraintlist |
[12] constraintlist ::= constraint | constraintlist constraint
[13] constraint ::= domain | filetype
[14] domain ::= indomain | outdomain
[15] indomain ::= DOMAIN EQ url
[16] outdomain ::= DOMAIN NE url
[17] url ::= QSTRING
[18] filetype ::= acceptfiletype | rejectfiletype
[19] acceptfiletype ::= TYPE EQ TYPEVALUE
[20] rejectfiletype ::= TYPE NE TYPEVALUE
[21] withinstmt ::= WITHIN QSTRING |
GQL IDE structure
Googlerecognizable
tokens
GQL IDE
Query
Program
GQL
Compiler
Googlerecognizable
tokens
Googlerecognizable
tokens
Google
Search
Engine
Query
Result
Compiler implementation in CDE
JLex
Specification
JLex
GQL
Specification
CUP
Specification
TLG
Compiler
CUP
Lexer in
Java
Parser in
Java
AST Nodes
Typechecking
in AspectJ
Code generation
in AspectJ
Aspect
Weaving
GQL
Compiler
Current status
• Basic GQL compiler
• IDE supporting multiple document management
– Program storage
– Editing
– Compiling, type-checking and execution
• Functionality including all features of Google
web & image search
• Search within old queries
Future work
• Extending the grammar to implement all the
functionality provided by Google
• Adding more strict type-checking for source
programs written in GQL
• Search result integration.
Conclusion
• To provide more flexibility in online search, a
SQL-like query language is developed in the
Google query domain.
• Language programs are used to substitute the
provided query forms from Google, analogical to
SQL and query forms in DBMS, e.g. MS-Access.
• The idea could be generalized to other domains,
especially in online searching, e.g. airfare
searching.