Context-Aware Metadata Enrichment in Enterprise Master Data Management: A Natural Language Processing Approach for EBX Repositories

Nagender Yamsani

Context-Aware Metadata Enrichment in Enterprise Master Data Management: A Natural Language Processing Approach for EBX Repositories

Nagender Yamsani

Abstract

Organizations that rely on enterprise master data platforms often encounter persistent limitations in metadata quality, particularly in areas such as semantic clarity, contextual relevance, and cross domain interpretability. This study examines the use of natural language processing to enable context aware metadata enrichment within EBX repositories, addressing the challenge of transforming fragmented descriptive fields into structured, meaningful knowledge assets. The purpose of this research is to design and evaluate a systematic enrichment approach that can interpret textual attributes, infer relationships, and enhance metadata usability for governance, integration, and analytics. A mixed research method was applied, combining architectural modeling, controlled prototype implementation, and qualitative assessment of stewardship workflows in simulated enterprise scenarios. Observed outcomes demonstrate measurable improvements in classification consistency, metadata coverage, and retrieval efficiency, while also reducing dependence on manual interpretation. The proposed framework introduces a scalable enrichment pipeline that integrates linguistic analysis, semantic mapping, and governance driven validation within the operational lifecycle of EBX master data. This study argues that embedding language aware intelligence into metadata management practices can significantly strengthen data reliability and transparency. The findings provide a foundation for future research on semantic infrastructure in enterprise data ecosystems and offer practical guidance for organizations seeking to modernize metadata governance in complex master data environments.

Full Text:

PDF

References

Rahm, E., & Bernstein, P. A. (2001). A survey of approaches to automatic schema matching. The VLDB Journal, 10(4), 334–350. https://doi.org/10.1007/s007780100057

Otto, B. (2011). Data governance. Business & Information Systems Engineering, 3(4), 241–244. https://doi.org/10.1007/s12599-011-0162-8

Abraham, R., Schneider, J., & vom Brocke, J. (2019). Data governance: A conceptual framework, structured review, and research agenda. International Journal of Information Management, 49, 424–438. https://doi.org/10.1016/j.ijinfomgt.2019.07.008

Ofner, M. H., Otto, B., Österle, H., & Stein, A. (2013). Management of the master data lifecycle: A framework for analysis. Journal of Enterprise Information Management, 26(4), 472–491. https://doi.org/10.1108/JEIM-05-2013-0026

Stvilia, B., Gasser, L., Twidale, M. B., & Smith, L. C. (2007). A framework for information quality assessment. Journal of the American Society for Information Science and Technology, 58(12), 1720–1733. https://doi.org/10.1002/asi.20652

Margaritopoulos, T., Margaritopoulos, M., Mavridis, I., & Manitsaris, A. (2008). A conceptual framework for metadata quality assessment. Proceedings of the International Conference on Dublin Core and Metadata Applications. https://doi.org/10.23106/dcmi.952109222

Bellini, E., & Nesi, P. (2013). Metadata quality assessment tool for open access cultural heritage institutional repositories. In Information Technologies for Performing Arts, Media Access, and Entertainment (pp. 90–103). https://doi.org/10.1007/978-3-642-40050-6_9

Liolios, K., Schriml, L., Hirschman, L., et al. (2012). The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness. Standards in Genomic Sciences, 6, 444–453. https://doi.org/10.4056/sigs.2675953

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18

Missier, P., Belhajjame, K., & Cheney, J. (2013). The W3C PROV family of specifications for modelling provenance metadata. Proceedings of the EDBT Conference. https://doi.org/10.1145/2452376.2452478

Euzenat, J., & Shvaiko, P. (2013). Ontology Matching (2nd ed.). Springer. https://doi.org/10.1007/978-3-642-38721-0

Bellahsene, Z., Bonifati, A., & Rahm, E. (2011). Schema Matching and Mapping. Springer. https://doi.org/10.1007/978-3-642-16518-4

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). DBpedia: A nucleus for a web of open data. In The Semantic Web (pp. 722–735). https://doi.org/10.1007/978-3-540-76298-0_52

Kiryakov, A., Popov, B., Terziev, I., Manov, D., & Ognyanoff, D. (2004). Semantic annotation, indexing, and retrieval. Web Semantics, 2(1), 49–79. https://doi.org/10.1016/j.websem.2004.07.005

Petasis, G., Karkaletsis, V., Paliouras, G., Krithara, A., & Zavitsanos, E. (2011). Ontology population and enrichment: State of the art. In Knowledge-Driven Multimedia Information Extraction and Ontology Evolution (pp. 134–166). https://doi.org/10.1007/978-3-642-20795-2_6

Martínez-Rodríguez, J. L., Hogan, A., & López-Arevalo, I. (2018). Information extraction meets the Semantic Web: A survey. Semantic Web, 11(2), 255–335. https://doi.org/10.3233/SW-180333

Newman, D., Hagedorn, K., Chemudugunta, C., & Smyth, P. (2007). Subject metadata enrichment using statistical topic models. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (pp. 366–375). https://doi.org/10.1145/1255175.1255248

Glowacka-Musial, M. (2022). Applying topic modeling for automated creation of descriptive metadata for digital collections. Information Technology and Libraries, 41(2). https://doi.org/10.6017/ital.v41i2.13799

Ristoski, P., & Paulheim, H. (2016). RDF2Vec: RDF graph embeddings for data mining. In The Semantic Web (pp. 498–514). https://doi.org/10.1007/978-3-319-46523-4_30

Lubani, M., & Deters, R. (2019). Ontology population: Approaches and design aspects. Journal of Information Science, 45(4), 456–470. https://doi.org/10.1177/0165551518801819

Finkel, J. R., Grenager, T., & Manning, C. (2005). Incorporating non-local information into information extraction systems by Gibbs sampling. Proceedings of the Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.3115/1219840.1219885

Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. Proceedings of ACL System Demonstrations, 55–60. https://doi.org/10.3115/v1/P14-5010

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

A Double-Blind Peer Reviewed Journal

Username
Password
Remember me

International Journal of Sustainable Development in Computing Science