Custom data file format and Upload
The external text data that you have acquired from public sources including social media and the internal data created from your business can be organised as a custom data file and uploaded to our platform for processing and creating a dataset. The following sections provides information on the supported data format , upload settings to select and how you can upload the data to our platform.
What is the data format supported?
We currently support uploading a CSV file containing text data up to 10 MB in size. If you have large size files, please contact us.
How to create a CSV text data file and upload?
- If you are using Windows, Please open a blank excel file
- Create a column no.1 with “name” as the header.
- Create column no.2 with “message” as the header.
- Under the header name, Populate column no.1 with the user or numeric or any ids as per your scenario
- Under the header message, Populate column no.2 with the actual text content
- Save the file as a CSV file with UTF-8 Encoding. (Please refer to the Video demo as below when you save from your Windows system)
- You may also create CSV files programatically as per your data scenario.
To create a dataset by bringing your own data to be uploaded, you can specify the following settings from the Create Dataset page.
Data Type selection
Please specify one of the following types based on the source and the envisaged data quality.
- Internal or Organized data: Select this option if you have created text data in a known or controlled environment internal to your organization. Typically, Consumer interview transcripts, Private community commentaries, Qualitative focus group or panel notes, Email communication logs are suitable for this.
- External & Noisy data: Select this option if your text data is from public sources such as from social media, Web reviews, Blogs and survey (verbatim)
Data Processing options
Please specify one of the following processing options based on the nature of content and language.
- Long Content: When each text data entry in the dataset contains on average more than 4 sentences long. This is typically found in content sourced from Blogs, News articles, Interviews, etc.
- Language Filter: Enable this for noisy, external (social media or web reviews) or internal text data only if you expect text in languages other than English. (e.g. web reviews text data containing English and Spanish reviews)
When you have selected one or more of the above options based on your data, Please review and upload your file to create a dataset.