uploading csv data to update entire database

Options

Hi, I'm hoping that some experienced Xano user might be able to offer a solution? I am working for a client who has a Launch account and they have a main table with a 100,000 records (each with about 25 fields).
They want to be able to upload a csv file (actually 3) and update the table. Not really a problem (thanks to a snippet by Chris) if the file(s) only contained new records, but the file(s) are a dump from other systems and contains all 100,000 records of the data (old, new and updated). I could potentially search for the equivalent record and overwrite that record, but I am pretty sure that I would run into memory problems and I imagine it would be awfully slow. Please correct me if I'm wrong.
My other desperate thought was that I could create a new table (or clone the structure of the existing one), upload the data and then delete the original one and rename the new table, but I don't think the Xano api currently allows DDL operations.

Any thoughts or advice/solutions would be greatly appreaciated.
thanks,
bren

Answers

  • Pawel Magdanski
    Pawel Magdanski Member ✭✭✭
    Options

    I don't think that deleting and recreating a table is a good idea here. How fast You need this update to be?

  • Ray Deck
    Ray Deck Trusted Xano Expert ✭✭✭
    Options

    Data synchronization is tricky! Also loading/working with the CSV with 2.5MM fields (100k records x 25 fields each) can cause memory challenges.

    Some principles I have found helpful:

    1) distribute the work of updating records into endpoints that handle, say, a page (1000 records for example) of data at a time. That way you isolate the memory impact of this manipuation.

    2) After the initial data load, try looking up the records in the database and comparing them with the new data to see if the information has to be updated at all. You might find that you really only have to update, say, 10k records instead of all 100k. If that's the case, for the price of some extra (cheap!) "get record" calls, you can avoid a lot of (expensive!) edit record calls.

    3) Revisit why we are doing this in the first place. When we are dealing iwth cloned data, our first naive thought is that loading it into database records obviously good. But at scale they are often not. And there can be other, cheaper ways of utilizing the CSV if you just want it hanging aroudn for cross referencing. Visiting the need under the request can drive a 100x result - something radically faster, simpler, and more reliable.

    We've worked these data synch issues - at scale greater than this! - more than once in the State Change Pro office hours and community. We've found paths forward that included the principles above and some other techniques that really depend on the underlying use case. It's where we work on the hardest 5% of our challenges.

  • Pawel Magdanski
    Pawel Magdanski Member ✭✭✭
    Options

    I can highly recommend @Ray Deck StateChangePro community as I am a member myself. I'm sure you will get a right guidance there with optimization of such tasks

  • bren
    bren Member
    Options

    Hey guys, Thanks for the replies. Really appreciate them.
    @Ray Deck thanks for the detailed reply. Will give your points some thought, especially point 2, as from what I've been led to believe, only about 1000 records are modified/added each week?
    Will definitely look into joining your community in the future.
    Also will let the community know what solution I settle on, when I get it done, so hopefully it might offer some ideas to others in a similar situation.
    cheers
    Bren