Quaesia database description

MeSH Concept Schema (cMESH.json)
This table describes the structure of the entities, likely derived from MeSH (Medical Subject Headings), as they are stored in the database.
The cMESH integrates a total of ~355,000 MeSH records, structured as follows: Each supplementary concept record is explicitly linked to its closest fitting Descriptor, which serves as its direct parent in the database hierarchy.
Note on Tree Number: To maintain hierarchical structure for Supplementary Concepts (which do not have native Tree Numbers), a Tree Number is synthesized. This is achieved by taking the parent's Tree Number and concatenating it with a dot (.) and the MeSH Identifier (e.g. D05.750.716.822.111.C472990).
CodeDescriptionData type
_idCustom MeSH ID (cMESHId): The primary key. This is a custom integer representation derived from the original MeSH Unique Identifier (UI) for efficient indexing. Encoding: UI leading 'C' → 3; UI leading 'D' → 4 (e.g. "C030544" → 3030544)Integer
pPreferred Name: The official or primary name for the MeSH concept/descriptorString
dDescription: The text definition or scope note for the MeSH conceptString
lTNTree Numbers: A mapping of MeSH Tree Numbers to the count of direct children beneath that specific branchJSON Object ({String:Integer...})
lCDirect Children: A list of the cMESHId for all direct child concepts/nodes in the MeSH hierarchyArray ([Integer...])
ltSynonyms/Alternate Terms: A list of alternative terms associated with the conceptArray ([String...])
Interaction Concept Schema (cIT.json)
This table describes the controlled vocabulary used by Quaesia for classifying interaction types between entities.
CodeDescriptionData type
_idCustom Interaction Term ID (cITId): The primary key. An ad hoc integer identifier used internally by Quaesia to represent a specific interaction conceptInteger
pPreferred Name: the common name used to describe this interaction concept (e.g., "negative", "upstream", "activated/caused")String
lAssociated Terms: A list of synonyms or related phrases used to identify this interaction conceptArray ([String...])
cCategory: The broad classification or group to which this interaction concept belongsString
Publication Schema (cPMID.json)
This table stores metadata and structural data for biomedical scientific publications, indexed by their PubMed ID (PMID).
CodeDescriptionData type
_idPubMed ID (PMID): The unique identifier for the publication, serving as the primary keyInteger
jtJournal Title: The official title of the journal where the article was publishedString
yPublication Year: The year the article was publishedInteger
lSSection Sentences: A list of text sections and their corresponding sentence identifiers (cSentId). The section name indicates the type:
  • "T" → Title
  • "A" → Abstract
  • otherwise author-defined section name
Array of Tuples [(String, [Float...])...])
lAAuthors: A list of authors identified by their Custom Author ID (cAuthorsId) paired with the index number corresponding to their affiliation in the laff list
laffAffiliations: A list of all unique affiliation strings for the authors in this publicationString...]
PMCPMC ID: The PubMed Central unique identifierInteger
doiDOI Link: The Digital Object Identifier link (may include the prefix)String
Sentence Schema (cSent.json)
This table stores individual, normalized sentences extracted from biomedical publications.
Note on Normalization: Sentences have been normalized (e.g., removing double spaces and HTML tags). Consequently, the character-level positions (p) may not perfectly align with results from independent text analyses on the original, uncleaned source text.
CodeDescriptionData type
_idCustom Sentence ID (cSentId): The primary key. This is a composite ID calculated as PMID + (Sentence Index / 1,000,000)Float
tText: The full, normalized text of the sentenceString
pStart Position: The character-level starting index of the sentence within the concatenated publication text (Title + Abstract)Integer
Author Schema (cAuthor.json)
CodeDescriptionData type
_idCustom Author ID (cAuthorsId): The primary key. A string identifier created by concatenating the author's First Name and Last Name (e.g., "FirstName+"|"+LastName")String
lpPublished PMIDs: A list of PubMed IDs (PMIDs) corresponding to publications authored by this personArray ([Integer...])
lMMeSH of Interest: A list of tuples containing a Custom MeSH ID (cMESHId) and the count of PMIDs published by this author that are associated with that specific MeSH termArray of Tuples [(Integer, Integer)...]
MeSH List Schema (lcMESHId.json)
This table provides predefined, categorized lists of Custom MeSH IDs (cMESHId) for specific biomedical concepts.
CodeDescriptionData type
_idList Identifier: The unique name of the predefined list, indicating the category of MeSH concepts contained within.
  • 0: All available cMESHIds
  • MESH_CID: MeSH concepts mapped to a PubChem Compound ID (CID)
  • MESH_FDAApproved: Chemical entities that are FDA-approved
  • MESH_antibody: Chemical antibodies
  • MESH_bodyPart_organ: body parts and organs
  • MESH_cell_structure: cells and biological structures
  • MESH_defect_deficiency_disorder: defects, deficiencies, and disorders
  • MESH_disease: diseases
  • MESH_human_protein: human proteins
  • MESH_infection: infections
  • MESH_phenomen_process: biological phenomena and processes
  • MESH_side_effect: side effects
  • MESH_sign_symptom: clinical signs and symptoms
  • MESH_syndrome: syndromes
  • MESH_tissue: biological tissue
Integer or String
lCustom MeSH IDs (cMESHId): A list of the integer identifiers corresponding to the MeSH concepts for the specified categoryArray ([Integer...])
Interaction Term List Schema (lcIT.json)
CodeDescriptionData type
_idList Identifier: The unique identifier or name of the predefined list.
Predefined List Categories:
  • 0: A list containing all available cITIds
Integer
lCustom Interaction Term IDs: The list of integer identifiers (cITId) belonging to the category specified in the _id fieldArray ([Integer...])
PubMed ID List Schema (lPMID.json)
This table organizes all PubMed IDs (PMIDs) indexed in the database based on the source PubMed XML file from the PubMed FTP server.
Note on Chunking: The total list was partitioned into sublists to comply with MongoDB's 16 MB BSON document size limit.
CodeDescriptionData type
_idSource File Index: The index number corresponding to the source PubMedXXXX.xml file retrieved from the PubMed FTP serverInteger
lPubMed IDs (PMIDs): A list of all PMIDs sourced from the corresponding PubMedXXXX.xml file. Data Deduping Rule: If a PMID appears in multiple XML files, it is only included in the list associated with the highest source file index numberArray ([Integer...])
Sentence ID List Schema (lcSentId.json)
This table organizes all Custom Sentence IDs (cSentId) indexed in the database based on the source PubMed XML file from the PubMed FTP server.
Note on Chunking: The total list was partitioned into sublists to comply with MongoDB's 16 MB BSON document size limit.
CodeDescriptionData type
_idSource File Index: The index number corresponding to the source PubMedXXXX.xml file retrieved from the PubMed FTP serverInteger
lCustom Sentence IDs (cSentId): A list of all sentence identifiers sourced from the corresponding PubMedXXXX.xml file. Data Deduping Rule: If a cSentId appears in multiple XML files, it is only included in the list associated with the highest source file index number[Float...]
Author ID List Schema (lcAuthorId.json)
This table is used to chunk and organize all Custom Author IDs (cAuthorsId) into manageable sublists.
Note on Chunking: The total list was partitioned into sublists to comply with MongoDB's 16 MB BSON document size limit.
CodeDescriptionData type
_idList Index: An incremental integer number used to index and identify each chunk/sublist of author IDsInteger
lAuthor ID Sublist: A sublist containing Custom Author IDs (cAuthorsId). Each sublist is capped at a maximum of 10,000 author identifiersArray ([String...])
Publication Reference Index Schema (cPMID_cite_cPMID.json)
This table provides an explicit, indexed list of PubMed ID (PMID) pairs where the first publication cites the second (Reference Index).
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Citing PMID: The PubMed ID of the publication that includes the reference.Integer
2Cited PMID: The PubMed ID of the referenced publication.Integer
Automatic Annotation Schema (cSent_2_autoAnno.json)
This table stores the automatic Named Entity Recognition and Disambiguation (NERD) and Relationship Extraction (RE) results for individual sentences.
CodeDescriptionData type
_idCustom Sentence ID (cSentId): The primary key. A composite identifier calculated as PMID + (Sentence Index / 1,000,000)Float
NEntity Detection: A list of all entities automatically detected within the sentenceArray of Objects
Start: Character-level start position of the entity within the sentenceInteger
End: Character-level end position of the entity within the sentenceInteger
Associated cMESHIds: List of Custom MeSH IDs (cMESHId) associated with the detected entityArray ([Integer...])
RRelationship Extraction: A list of relationships automatically extracted between pairs of entities in the sentenceArray of Objects
Start of entity 1: Character-level start position of the first entity in the pairInteger
End of entity 1: Character-level end position of the first entity in the pairInteger
Start of entity 2: Character-level start position of the second entity in the pairInteger
End of entity 2: Character-level end position of the second entity in the pairInteger
List of associated cITId (cITId, score): A list of tuples, where each tuple contains a Custom Interaction Term ID (cITId) and the confidence score for that specific relationship typeArray of Tuples ([(Integer, Integer)...])
MeSH-to-Publication Index Schema (cMESHId_citeNERD_2lPMID.json)
This table serves as a reverse index, listing all PubMed IDs (PMIDs) where a specific Custom MeSH ID (cMESHId) has been detected by the Named Entity Recognition and Disambiguation (NERD) system.
CodeDescriptionData type
_idCustom MeSH ID (cMESHId): The primary key. The integer identifier of the MeSH concept whose occurrences are being indexedInteger
lPublications List (PMIDs): A list of PubMed IDs where the entity corresponding to the cMESHId was identified in the textArray ([Integer...])
Note on MongoDB Limit: Some PMID lists exceed the 16 MB limit. To successfully load this data into a MongoDB system, you will need to partition it into smaller documents (or 'chunks').
Relationship Evidence Sentence List Schema (cMESHId1_cMESHId2_cITId_lcSentId.json)
CodeDescriptionData type
1Custom MeSH ID 1 (cMESHId1): The first MeSH concept identifier in the relationship pair. Ordering Constraint: This ID is always the smaller value (cMESHId1 < cMESHId2) to ensure a unique key for the pairInteger
2Custom MeSH ID 2 (cMESHId2): The second MeSH concept identifier in the relationship pair. Ordering Constraint: This ID is always the larger value (cMESHId1 < cMESHId2)Integer
tInteraction Type ID (cITId): The Custom Interaction Term ID that defines the type of relationship found between cMESHId1 and cMESHId2. Integer
lSentence ID List (cSentId): The list of Custom Sentence IDs where the relationship between this specific entity pair and interaction type was detected[Float...]
Note on MongoDB Limit: Some cSentId lists exceed the 16 MB limit. To successfully load this data into a MongoDB system, you will need to partition it into smaller documents (or 'chunks').
Custom MeSH to ChEBI Mapping Schema (cMESHId_map_CHEBI.json)
This table provides a mapping between Quaesia's internal Custom MeSH Identifiers (cMESHId) and the external ChEBI (Chemical Entities of Biological Interest) Identifiers.
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Custom MeSH ID (cMESHId): The integer identifier for the MeSH concept (often a chemical entity) being mappedInteger
2ChEBI Identifier (CHEBI ID): The unique integer ID assigned by the ChEBI database for the corresponding chemical entityInteger
Custom MeSH to ChEMBL Mapping Schema (cMESHId_map_CHEMBL.json)
This table provides a mapping between Quaesia's internal Custom MeSH Identifiers (cMESHId) and the external ChEMBL (Chemical Database of Bioactive Molecules) Identifiers.
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Custom MeSH ID (cMESHId): The integer identifier for the MeSH concept (often a chemical entity) being mappedInteger
2CHEMBL identifier (CHEMBL ID): The unique string ID assigned by the ChEMBL database for the corresponding bioactive moleculeString
Custom MeSH to Cell Line Mapping Schema (cMESHId_map_CL.json)
This table provides a mapping between Quaesia's internal Custom MeSH Identifiers (cMESHId) and the external Cell Line Ontology (CL) Identifiers.
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Custom MeSH ID (cMESHId): The integer identifier for the MeSH concept related to a cell line or cell type being mappedInteger
2Cell Line Identifier (CL ID):The unique string identifier assigned by the external Cell Line OntologyString
Custom MeSH to Ensembl Gene Mapping Schema (cMESHId_map_ENSEMBLG.json)
This table provides a mapping between Quaesia's internal Custom MeSH Identifiers (cMESHId) and the external NCBI Gene Identifiers (Entrez Gene IDs).
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Custom MeSH ID (cMESHId): The integer identifier for the MeSH concept (often a gene or protein) being mappedInteger
2Ensembl Gene Identifier (ENSEMBLG): The unique string ID assigned by the Ensembl database for the corresponding geneString
Custom MeSH to GeneId Mapping Schema (cMESHId_map_GENEID.json)
This table provides a mapping between Quaesia's internal Custom MeSH Identifiers (cMESHId) and the external GeneId Identifiers.
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Custom MeSH ID (cMESHId): The integer identifier for the MeSH concept (often a gene, protein, or related entity) being mappedInteger
2NCBI Gene ID (GENEID): The unique integer identifier assigned by the NCBI Gene database for the corresponding geneInteger
Custom MeSH to Gene Ontology Mapping Schema (cMESHId_map_GO.json)
This table provides a mapping between Quaesia's internal Custom MeSH Identifiers (cMESHId) and the external Gene Ontology (GO) Identifiers, linking biomedical subject headings to functional biological concepts.
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Custom MeSH ID (cMESHId): The integer identifier for the MeSH concept being mapped (e.g., a specific biological process or molecule)Integer
2Gene Ontology Identifier (GO ID): The unique string ID assigned by the Gene Ontology Consortium for the corresponding functional annotation (e.g., biological process, molecular function, or cellular component)String
Custom MeSH to Original MeSH Id Mapping Schema (cMESHId_map_MESH.json)
This table provides a mapping between Quaesia's internal Custom MeSH Identifiers (cMESHId) and the external, original MeSH Unique Identifiers (UI).
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Custom MeSH ID (cMESHId): The integer identifier for the MeSH concept being mappedInteger
2MeSH Unique Identifier (MESH ID): The original unique identifier (UI) assigned by the National Library of Medicine (NLM) for the corresponding MeSH conceptString
Custom MeSH to PubChem CID Mapping Schema (cMESHId_map_PubChemCID.json)
This table provides a mapping between Quaesia's internal Custom MeSH Identifiers (cMESHId) and the external PubChem Compound Identifiers (CIDs), linking biomedical subject headings to chemical structure records.
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Custom MeSH ID (cMESHId): The integer identifier for the MeSH concept (usually a chemical or drug) being mappedInteger
2PubChem Compound Identifier (CID): The unique integer ID assigned by the PubChem database for the corresponding compoundInteger
Custom MeSH to MeSH Tree Number Mapping Schema (cMESHId_map_TN.json)
This table provides a mapping between Quaesia's internal Custom MeSH Identifiers (cMESHId) and the hierarchical MeSH Tree Numbers (TN), defining the concept's position in the MeSH hierarchy.
Note on Tree Number: To maintain hierarchical structure for Supplementary Concepts (which do not have native Tree Numbers), a Tree Number is synthesized. This is achieved by taking the parent's Tree Number and concatenating it with a dot (.) and the MeSH Identifier (e.g. D05.750.716.822.111.C472990).
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Custom MeSH ID (cMESHId): The integer identifier for the MeSH concept being mappedInteger
2MeSH Tree Number (TN): The unique string identifier that specifies the hierarchical location of the MeSH concept (e.g., C04.557.568.125)String
Custom MeSH to Taxonomy ID Mapping Schema (cMESHId_map_TaxId.json)
This table provides a mapping between Quaesia's internal Custom MeSH Identifiers (cMESHId) and the external NCBI Taxonomy Identifiers (TaxId), linking biomedical subject headings to specific species or taxonomic groups.
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Custom MeSH ID (cMESHId): The integer identifier for the MeSH concept being mapped (e.g., a specific organism, protein, or disease related to a species)Integer
2Taxonomy Identifier (TaxId): The unique string ID assigned by the NCBI Taxonomy database for the corresponding species or taxonomic nodeString
Custom MeSH to UMLS Mapping Schema (cMESHId_map_UMLS.json)
This table provides a mapping between Quaesia's internal Custom MeSH Identifiers (cMESHId) and the external UMLS Concept Identifiers (CUIs), linking MeSH terms to the broader biomedical terminology within the Unified Medical Language System.
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Custom MeSH ID (cMESHId): The integer identifier for the MeSH concept being mappedInteger
2UMLS Concept Identifier (CUI): The unique string identifier assigned by the UMLS Metathesaurus (often formatted as CxxxxxxxString
Custom MeSH to UniProt Mapping Schema (cMESHId_map_UNIPROT.json)
This table provides a mapping between Quaesia's internal Custom MeSH Identifiers (cMESHId) and the external UniProt Knowledgebase Identifiers, linking biomedical subject headings to specific protein records.
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Custom MeSH ID (cMESHId): The integer identifier for the MeSH concept (often a protein or peptide) being mappedInteger
2UniProt Identifier: The unique string ID assigned by the UniProt Knowledgebase for the corresponding protein sequence record (e.g., a UniProt Accession number)String
Custom MeSH to UniProt Gene Name Mapping Schema (cMESHId_map_UNIPROTGeneName.json)
This table provides a mapping between Quaesia's internal Custom MeSH Identifiers (cMESHId) and the external UniProt Preferred Gene Names.
CodeDescriptionData type
_idRecord Index: An incremental integer number used as a sequential index for the records in this mapping tableInteger
1Custom MeSH ID (cMESHId): The integer identifier for the MeSH concept (usually a gene or protein) being mappedInteger
2UniProt Gene Name: The preferred, official gene symbol associated with the corresponding protein record in the UniProt KnowledgebaseString
Term to Custom MeSH ID List Schema (t_is_lcMESHId.json)
This table serves as a reverse index, mapping a specific term (word or phrase) to all associated Custom MeSH Identifiers (cMESHId) found within the database. This allows for quick retrieval of all MeSH concepts related to a search term.
CodeDescriptionData type
_idTerm: The unique word or phrase (e.g., a synonym or root) that serves as the primary key for the indexString
lCustom MeSH ID List: A list of the Custom MeSH IDs (cMESHId) that are linked to the defined termArray ([Integer...])