Topological Relation Aware Transformer
Abstract:
We present a Topological Relation Aware Transformer (T-RAT),
a specialized head transformer to open sets, an element of the topology τ generated by the set S, the set of all pre-existing
relations between input tokens of the model. From this topological space (S,
τ), we present the way to spread each open set to one head of our Transformer.
T-RAT improves exact match accuracy in Text-To-SQL challenge (62.09%) without any
enhancement of large language models compared to the baseline models RAT-SQL (57.2%)
and Light RAT-SQL (60.25%).
Keywords: Deep learning, Natural Language Processing, Neural Semantic Parsing, Relation Aware Transformer, RAT-SQL, Text-To-SQL Transformer.
References:
[1] T. Scholak, R. Li, D.
Bahdanau, H. de Vries, and C. Pal, “DuoRAT: Towards Simpler Text-to-SQL
Models,” Oct. 2020, doi: 10.18653/v1/2021.naacl-main.103.
[2] W. Hou and Y. Nie,
“Seq2seq-Attention Question Answering Model.
[3] O. Goldman, V.
Latcinnik, U. Naveh, A. Globerson, and J. Berant, “Weakly-supervised Semantic
Parsing with Abstract Examples,” Nov. 2017, [Online]. Available: http://arxiv.org/abs/1711.05240.
[4] X. V. Lin, R. Socher,
and C. Xiong, “Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL
Semantic Parsing,” Dec. 2020, [Online]. Available: http://arxiv.org/abs/2012.12627
[5] I Gur, S. Yavuz, Y.
Su, and X. Yan, “DialSQL: Dialogue Based Structured Query Generation.”
[6] X. Xu, C. Liu, and D.
Song, “SQLNet: Generating Structured Queries from Natural Language Without
Reinforcement Learning,” Nov. 2017, [Online]. Available: http://arxiv.org/abs/1711.04436.
[7] B. Wang, R. Shin, X.
Liu, O. Polozov, and M. Richardson, “RAT-SQL: Relation-Aware Schema Encoding
and Linking for Text-to-SQL Parsers,” 2020. [Online]. Available: https://github.com/Microsoft/rat-sql.
[8] N. M. Ndongala,
“Light RAT-SQL: A RAT-SQL with More Abstraction and Less Embedding of
Pre-existing Relations,” Texila Int. J. Acad. Res., vol. 10, no. 2, pp. 1–11,
2023, doi: 10.21522/tijar.2014.10.02.art001.
[9] G. Huilin, G. Tong,
W. Fan, and M. Chao, “Bidirectional attention for SQL generation,” in 2019 IEEE
4th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA
2019, Institute of Electrical and Electronics Engineers Inc., Apr. 2019, pp.
676–682. doi: 10.1109/ICCCBDA.2019.8725626.
[10] K. Clark, M.-T.
Luong, Q. V. Le, and C. D. Manning, “ELECTRA: Pre-training Text Encoders as
Discriminators Rather Than Generators,” Mar. 2020, [Online]. Available: http://arxiv.org/abs/2003.10555.
[11] M. Lewis et al.,
“BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
Generation, Translation, and Comprehension,” Oct. 2019, [Online]. Available: http://arxiv.org/abs/1910.13461.
[12] M. Shoeybi, M.
Patwary, R. Puri, P. Legresley, J. Casper, and B. Catanzaro, “Megatron-LM:
Training Multi-Billion Parameter Language Models Using Model Parallelism,”
2020. [Online]. Available: https://github.com/.
[13] T. B. Brown et al.,
“Language Models are Few-Shot Learners,” 2020. [Online]. Available: https://commoncrawl.org/the-data/.
[14] Z. Lan et al.,
“ALBERT: A Lite Bert for Self-Supervised Learning Of Language Representations,”
2020. [Online]. Available: https://github.com/google-research/ALBERT.
[15] A. Radford, J. Wu, R.
Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised
Multitask Learners,” 2019. [Online]. Available: https://github.com/codelucas/newspaper.
[16] V. Zhong, C. Xiong,
and R. Socher, “Seq2SQL: Generating Structured Queries from Natural Language
using Reinforcement Learning,” Aug. 2017, [Online]. Available: http://arxiv.org/abs/1709.00103.
[17] A. Vaswani et al.,
“Attention Is All You Need,” Jun. 2017, [Online]. Available: http://arxiv.org/abs/1706.03762.
[18] O. Vinyals, M.
Fortunato, and N. Jaitly, “Pointer Networks,” Jun. 2015, [Online]. Available: http://arxiv.org/abs/1506.03134.
[19] Z. Tu, Z. Lu, L.
Yang, X. Liu, and H. Li, “Modeling coverage for neural machine translation,” in
54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
- Long Papers, Jan. 2016, pp. 76–85. doi: 10.18653/v1/p16-1008.
[20] P. Shaw, J.
Uszkoreit, G. Brain, and A. Vaswani, “Self-Attention with Relative Position
Representations,” 2018.
[21] L. Zehui, P. Liu, L.
Huang, J. Chen, X. Qiu, and X. Huang, “DropAttention: A Regularization Method
for Fully Connected Self-Attention Networks,” Jul. 2019, Accessed: Apr. 04,
2022. [Online]. Available: http://arxiv.org/abs/1907.11065.
[22] E. M. Bender, T.
Gebru, A. McMillan-Major, and S. Shmitchell, “On the dangers of stochastic
parrots: Can language models be too big?” in FAccT 2021 - Proceedings of the
2021 ACM Conference on Fairness, Accountability, and Transparency, Association
for Computing Machinery, Inc, Mar. 2021, pp. 610–623. doi:
10.1145/3442188.3445922.
[23] D. E. B. A. D.
Ecoding and E. Bert, “Entangled A Ttention with D Is -,” 2021.
[24] W. Fedus, B. Zoph,
and N. Shazeer, “Switch Transformers: Scaling to Trillion Parameter Models with
Simple and Efficient Sparsity,” 2022.
[25] J. Devlin, Ming-Wei
Chang, Kenton Lee, and Kristina Toutanova, “BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding (Bidirectional Encoder
Representations from Transformers),” Bert-Ppt, 2018.
[26] V. Sanh, L. Debut, J.
Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller,
faster, cheaper and lighter,” Oct. 2019, [Online]. Available: http://arxiv.org/abs/1910.01108.
[27] Y. Liu et al.,
“RoBERTa: A Robustly Optimized BERT Pretraining Approach,” Jul. 2019, [Online].
Available: http://arxiv.org/abs/1907.11692.
[28] H. Li, J. Zhang, C.
Li, and H. Chen, “RESDSQL: Decoupling Schema Linking and Skeleton Parsing for
Text-to-SQL,” Feb. 2023, [Online]. Available: http://arxiv.org/abs/2302.05965
[29] L. Zhao, H. Cao, and
Y. Zhao, “GP: Context-free Grammar Pre-training for Text-to-SQL Parsers,” Jan.
2021, [Online]. Available: http://arxiv.org/abs/2101.09901.
[30] P. Shi et al.,
“Learning Contextual Representations for Semantic Parsing with
Generation-Augmented Pre-Training,” Dec. 2020, [Online]. Available:
http://arxiv.org/abs/2012.10309
[31] T. Yu et al.,
“GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing,” Sep. 2020,
[Online]. Available: http://arxiv.org/abs/2009.13845.
[32] X. Deng, A. H.
Awadallah, C. Meek, O. Polozov, H. Sun, and M. Richardson, “Structure-Grounded
Pretraining for Text-to-SQL,” Oct. 2020, doi: 10.18653/v1/2021.naacl-main.105.
[33] P. Yin and G. Neubig,
“TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing
and Code Generation,” Oct. 2018, [Online]. Available: http://arxiv.org/abs/1810.02720.
[34] P. Yin, C. Zhou, J.
He, and G. Neubig, “STRUCTVAE: Tree-structured Latent Variable Models for
Semi-supervised Semantic Parsing.” [Online]. Available: http://pcyin.me/struct.
[35] L. Dong and M.
Lapata, “Language to Logical Form with Neural Attention,” Jan. 2016, [Online].
Available: http://arxiv.org/abs/1601.01280.
[36] L. Dong and M.
Lapata, “Coarse-to-Fine Decoding for Neural Semantic Parsing,” May 2018,
[Online]. Available: http://arxiv.org/abs/1805.04793.
[37] A. Gopalan et al.,
“Neural Structured Learning: Training Neural Networks with Structured Signals,”
in WSDM 2021 - Proceedings of the 14th ACM International Conference on Web
Search and Data Mining, 2021. doi: 10.1145/3437963.3441666.
[38] I. Gopalan et al.,
“Neural Structured Learning,” 2020. doi: 10.1145/3394486.3406701.
[39] II. Bogin, M.
Gardner, and J. Berant, “Global Reasoning over Database Structures for
Text-to-SQL Parsing,” 2019.
[40] Y. Ma and J. Tang,
“Graph Neural Networks in Natural Language Processing,” in Deep Learning on
Graphs, 2021. doi: 10.1017/9781108924184.015.
[41] I. Hui et al.,
“S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for
Text-to-SQL Parsers,” Mar. 2022, [Online]. Available: http://arxiv.org/abs/2203.06958.
[42] R. Cai, J. Yuan, B.
Xu, and Z. Hao, “SADGA: Structure-Aware Dual Graph Aggregation Network for
Text-to-SQL,” Oct. 2021, [Online]. Available: http://arxiv.org/abs/2111.00653.
[43] R. Cao, L. Chen, Z.
Chen, Y. Zhao, S. Zhu, and K. Yu, “LGESQL: Line Graph Enhanced Text-to-SQL
Model with Mixed Local and Non-Local Relations,” Jun. 2021, [Online].
Available: http://arxiv.org/abs/2106.01093.
[44] OpenAI et al., “GPT-4
Technical Report,” vol. 4, pp. 1–100, 2023, [Online]. Available: http://arxiv.org/abs/2303.08774.
[45] M. Pourreza and D.
Rafiei, “DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with
Self-Correction,” no. NeurIPS, pp. 1–34, 2023, [Online]. Available: http://arxiv.org/abs/2304.11015.
[46] I. Gao et al.,
“Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation,” 2023,
[Online]. Available: http://arxiv.org/abs/2308.15363.
[47] I. A. Dahl et al.,
“EXPANDING THE SCOPE OF THE ATIS TASK: THE ATIS-3 CORPUS.”
[48] Y. Gan et al.,
“Towards robustness of text-to-SQL models against synonym substitution,”
ACL-IJCNLP 2021 - 59th Annu. Meet. Assoc. Comput. Linguist. 11th Int. Jt. Conf.
Nat. Lang. Process. Proc. Conf., pp. 2505–2515, 2021, doi:
10.18653/v1/2021.acl-long.195.
[49] P. Utama et al., “An
End-to-end Neural Natural Language Interface for Databases,” 2018, [Online].
Available: http://arxiv.org/abs/1804.00401.
[50] T. Yu et al.,
“Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain
Semantic Parsing and Text-to-SQL Task,” Sep. 2018, [Online]. Available: http://arxiv.org/abs/1809.08887.
[51] T. Yu et al., “SPARC:
Cross-domain semantic parsing in context,” ACL 2019 - 57th Annu. Meet. Assoc.
Comput. Linguist. Proc. Conf., pp. 4511–4523, 2020, doi: 10.18653/v1/p19-1443.
[52] X. Yu et al.,
“Dataset and enhanced model for eligibility criteria-to-SQL semantic parsing,”
Lr. 2020 - 12th Int. Conf. Lang. Resour. Eval. Conf. Proc., no. May, pp.
5829–5837, 2020.
[53] H. Zhang et al.,
“CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset,” Proc.
Annu. Meet. Assoc. Comput. Linguist., pp. 6970–6983, 2023, doi:
10.18653/v1/2023.findings-acl.435.
[54] Y. Gan, X. Chen, and
M. Purver, “Exploring Underexplored Limitations of Cross-Domain Text-to-SQL
Generalization,” EMNLP 2021 - 2021 Conf. Empir. Methods Nat. Lang. Process.
Proc., pp. 8926–8931, 2021, doi: 10.18653/v1/2021.emnlp-main.702.
[55] C. T. Hemphill, J. J.
Godfrey, and G. R. Doddington, “The ATIS Spoken Language Systems Pilot Corpus.”
[56] Q. Min, Y. Shi, and
Y. Zhang, “A pilot study for Chinese SQL semantic parsing,” EMNLP-IJCNLP 2019 -
2019 Conf. Empir. Methods Nat. Lang. Process. 9th Int. Jt. Conf. Nat. Lang.
Process. Proc. Conf., pp. 3652–3658, 2019, doi: 10.18653/v1/d19-1377.
[57] D. Sean and P. S.
Meltzer, “GEOquery: A bridge between the Gene Expression Omnibus (GEO) and
BioConductor,” Bioinformatics, vol. 23, no. 14, pp. 1846–1847, Jul. 2007, doi:
10.1093/bioinformatics/btm254.
[58] T. Shi, C. Zhao, J.
Boyd-Graber, H. Daumé, and L. Lee, “On the potential of lexico-logical
alignments for semantic parsing to SQL queries,” Find. Assoc. Comput. Linguist.
Find. ACL EMNLP 2020, pp. 1849–1864, 2020, doi:
10.18653/v1/2020.findings-emnlp.167.
[59] M. Singh et al., “CL
Scholar: The ACL Anthology Knowledge Graph Miner,” NAACL HLT 2018 - 2018 Conf.
North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Demonstr.
Sess., pp. 16–20, 2018, doi: 10.18653/v1/n18-5004.
[60] A. Suhr, M. W. Chang,
P. Shaw, and K. Lee, “Exploring unexplored generalization challenges for
cross-database semantic parsing,” Proc. Annu. Meet. Assoc. Comput. Linguist.,
pp. 8372–8388, 2020, doi: 10.18653/v1/2020.acl-main.742.
[61] L. R. Tang and R. J.
Mooney, “A u t o m a t e d Construction of Database Interfaces: Integrating
Statistical and Relational Learning for Semantic Parsing,” 1996.
[62] J. MUNKRES, Topology. Pearson College Div, 2000.
[Online]. Available: https://www.amazon.com/Topology-2nd-James-Munkres/dp/0131816292.
[63] .P Yin and G. Neubig,
“A Syntactic Neural Model for General-Purpose Code Generation,” Apr. 2017,
[Online]. Available: http://arxiv.org/abs/1704.01696.
[64] C. D. Manning, M.
Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. Mcclosky, “The Stanford
CoreNLP Natural Language Processing Toolkit,” 2014.
[65] J. Pennington, R.
Socher, and C. D. Manning, “GloVe: Global Vectors for Word Representation,”
2014. [Online]. Available: http://nlp.
[66] S. Hochreiter and J.
Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, 1997,
doi: 10.1162/neco.1997.9.8.1735.
[67] A. Paszke et al.,
“PyTorch: An Imperative Style, High-Performance Deep Learning Library,” 2019.
[68] D. P. Kingma and J.
Ba, “Adam: A Method for Stochastic Optimization,” Dec. 2014, [Online].
Available: http://arxiv.org/abs/1412.6980.
[69] J. Guo et al.,
“Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate
Representation,” 2019.