pdyc 10 days ago

crashed with big file https://www.papaparse.com/resources/files/big.csv i evaluated sqlite for my tool and wrote about it here https://newbeelearn.com/blog/million-rows-csv-debug-story/

BTW i checked your sqlite.wasm to be quite compact, just 340kb are you doing custom compilation.

  • freakynit 10 days ago

    Hey, thanks for trying out.

    I did check out with the csv you linked. It worked for me. https://ibb.co/cb3SFbR. Before making it live, I did already test it with 100MB+ CSV's having like 10+ columns.

    I also checked my sentry integration, but couldn't find any errors to your issue. Seems to me like OS/browser level ram restriction. In case you have anything to help me with debug this, do share..

    Im using regular sqlite-wasm. No custom stuff.

    Thanks..

    • pdyc 9 days ago

      i tested on machine with 4gb ram, also i modified the query to remove the limit i.e. "select * from data"

      can you let me know from where you are picking sqlite wasm? i checked here https://sqlite.org/download.html and sqlite wasm shared here is approx 830kb

      Edit: never mind checked here https://github.com/sql-js/sql.js/releases/tag/v1.12.0 and its indeed around 330kb.

      • kfajdsl 9 days ago

        Loading that CSV and running `select * from data` left me with 879MB of memory usage on the tab.

    • itsbrex 9 days ago

      Works on my iPhone with the ‘big.csv’ file as well

      • freakynit 9 days ago

        Thanks a lot for trying it out and verifying :)

victor106 9 days ago

This is cool. Any pointers on how you built this?

  • freakynit 9 days ago

    Hey, thanks...

    Almost all the major work is done by sqlite (packaged as wasm binary), with second most by Papaparse library. I just joined these two and put up a decent web ui on top. The only manual thing done by me is schema parsing, which is based on first 1000 rows, and downloading functionality.

    • nbbaier 8 days ago

      How are you doing the schema parsing?

      • freakynit 8 days ago

        (1.) Since this is a javascript environment, the types are pretty minimal and just one of each top level type (unlike languages like java or C, which, for example, clould have int, float, double, etc. for just one `number` top-level type). These, here, in javascript, are: number, string, boolean, object, null/undefined.

        (2.) Then I use `typeof` operator for each field to determine appropriate javascript type.

        (3.) Then, I map these javascript types to sqlite types using this mapping:::

        const typeMapping = { 'number': 'REAL', 'string': 'TEXT', 'boolean': 'INTEGER', 'object': 'TEXT', 'undefined': 'TEXT' };

        (4.) Then, I generate create table sql's fields declaration part. This also handles an edge case where if first 1000 rows (used for schema generation) of a particular field are all null/undefined, I default to using sqlite's TEXT type using this:::

        const columns = fields.map(field => `'${field}' ${typeMapping[columnTypes[field]] || 'TEXT'}`);

        (5.) Finally, I create sqlite table:::

        const createTableSQL = `CREATE TABLE data (${columns.join(', ')});`;

        db = new SQL.Database();

        db.run(createTableSQL);

        Thanks..