MACHINE LANGUAGE AUTOMATION

Regular expression programming, in Perl Python and Java, for text-processing including natural languages (NLP; English, UTF8), biomedical language, unstructured textual datasets, and machine languages (automated computer meta-coding, for ~100% automation of data pipelines).


NATURAL LANGUAGE PROCESSING

Natural language processing (NLP; English, UTF8), Apache UIMA and OpenNLP, Stanford POS tagger.

Text-processing expert on natural languages (NLP), biomedical texts (UMLS), and machine languages (automation).

Transforming and enriching un-structured data sources, using distributed regex text-processing and NLP, via PySpark, in Zeppelin and also command line interface (CLI).

Creating full-text indexes with real-time information retrieval, using Lucene.


BIOMEDICAL TEXT PROCESSING

Biomedical text processing, UMLS and its extension to 30M unique biomedical terms, Apache cTAKES.

Extensive analytic experience on National Library of Medicine's Unified Medical Language System (UMLS), and Metathesaurus source vocabularies; critical building block for all biomedical text processing and biomed NLP.

Processing and creating semantic terminologies, including SNOMED, RXNORM, HL7V3, ICD10, ICD9CM, LOINC, OMIM, DrugBank, MDDB, Gene Ontology, Foundational Model of Anatomy, Micromedex, MeSH, QMR, Current Dental Terminology.

Extensive experience working with health insurance claims data.

Multi-database mining of clinical data repositories along with publication knowledgebases.