The Quest
PostgreSQL is a hip and happening DBMS that’s gaining more and more popularity. It’s got all these cool features that make its users feel like they’re in the front row of a rock concert.
Unlike most of its competitors, PostgreSQL goes the extra mile and maintains a fancy web page where they list out all the features they’ve added in each version of this DBMS. I mean, who does that? Kudos to them!
So, here’s what I did. I copied and pasted all that feature goodness into Excel because, let’s face it, that’s way faster than writing a whole scraping program from scratch. Ain’t nobody got time for that!
With the data in hand, I decided to whip up a data visualization using Tableau. And let me tell you, it looked pretty darn snazzy. But there was one tiny little problem – it lacked explanations for each of those mind-blowing features I had showcased in my visualization. Oops!
But fear not, my friends, for I had a brilliant idea. I thought, « Why not ask Chat-GPT, using some Python magic and the power of APIs, to explain all 351 features I had in my data and store the results in a neat little table? »
And just like that, I embarked on a quest to uncover the secrets behind each and every PostgreSQL feature. Now, armed with wit, determination, and my trusty Python skills, I set forth to extract those elusive feature descriptions and conquer the data mountain. And guess what? I almost succeeded!
So here I am, with a shiny new table filled with the wisdom of PostgreSQL’s 351 features, courtesy of gpt-3.5-turbo and gpt-4 models. I didn’t double check all the features description but for a dozen that i have checked : 80% was correctly described with helpful examples but 20% has bad or non-helpful samples and 5% was misleading.
My final thought is : if you find the dashboard useful and you want to improve or correct the results, it would be cool if we federate to build a solution to store the human votes and human corrections
The Result : PostgreSQL Features Matrix GPT-Optimized
The first python program used to ask chat-gpt API
mport openai import pyodbc openai.api_key = 'xx-yoursecretkey-xxx' # Yes it is a sqlserver connection to store postgresql features conn = pyodbc.connect('DRIVER={SQL Server};SERVER=localhost;DATABASE=RDBMS_Features; Trusted_Connection=Yes') cursor = conn.cursor() # you can test a smaller dataset using a top X or LIMIT X query = "SELECT 'Explain PostgreSQL feature : '+ FeatureName as Question, FeatureName FROM postgresql_features WHERE FeatureDescription IS NULL" # Run the SQL query cursor.execute(query) # foreach for row in cursor.fetchall(): feature = row.FeatureName question = row.Question # call the API ChatGPT response = openai.Completion.create( engine='text-davinci-003', prompt=question, max_tokens=1000, temperature=0.7, n=1, stop=None ) answer = response.choices[0].text.strip() # update the target table with the answer update_query = f"UPDATE postgresql_features SET FeatureDescription = ? WHERE FeatureName = ?" cursor.execute(update_query, (answer, feature)) conn.commit() conn.close()
Pricing and models : switching to ChatGPT 3.5 Turbo
With the previous sample code you use a completion engine (text-davinci-003). This engine will complete your single question that should be well formed. openAI models, on the other hand allow you to better tune the question by giving a system role to the chat bot and (if you want) start a dialog by answering the first answer messages yourself and giving them to the AI..
Chat models have also i big advantage compare to completion engine : they are 10 times cheaper (see openAI documentation).
So here it is a code that you can also use with chat models :
import openai import pyodbc openai.api_key = 'xx-yoursecretkey-xxx' # Connexion à la base de données SQL Server conn = pyodbc.connect('DRIVER={SQL Server};SERVER=localhost;DATABASE=RDBMS_Features; Trusted_Connection=Yes') # Création du curseur pour exécuter les requêtes SQL cursor = conn.cursor() # Requête SQL pour sélectionner les questions sans réponse query = """SELECT Top 352 'Explain the PostgreSQL feature : "'+ FeatureName +'" and give me up to 3 samples usage in SQL of this feature (SQL or parameter settings). Always Start examples section after the description of the feature and start the example section with the a prefixe "Example:". Limit exemples to 3 and write exemple after the explanation.' as Question, FeatureName FROM postgresql_features WHERE FeatureSamples_GPT35Turbo IS NULL""" # Exécution de la requête SQL cursor.execute(query) # Parcourir les questions et obtenir les réponses for row in cursor.fetchall(): feature = row.FeatureName question = row.Question system_msg = "You are a postgresql database expert." messages=[{"role": "system", "content": system_msg}, {"role": "user", "content": question}] # Appel à l'API ChatGPT pour obtenir la réponse response = openai.ChatCompletion.create( model='gpt-3.5-turbo', messages=messages, max_tokens=1000, temperature=0.7, n=1, stop=None ) answer = response["choices"][0]["message"]["content"] print(f"answer = {answer}") # Mise à jour de la base de données avec la réponse obtenue update_query = f"UPDATE postgresql_features SET FeatureSamples_GPT35Turbo = ? WHERE FeatureName = ?" cursor.execute(update_query, (answer, feature)) conn.commit() # Fermeture de la connexion à la base de données conn.close()
Conclusion
It was quite fun to retrieve valuable informations from chat-GPT using a python script. And guess what : the code was 80% writed by chat-gpt itself !
But i had need to explore the openAI myself to discover the messages and roles mechanisms that operate with chat models (less expensive that completion engine)
Another though is that you should avoid to have a full dynamic API or dashboard that will relay question/prompt to the openAI API if you already know your set of questions you want to ask (like this case) :
- First it could cost you a lot if the dashboard become popular
- It is much slower than using a database as an « answers cache »
- It is much greener : you don’t use GPU cycle for a inference each-time you click or mouse over a square 🙂
At the time of writing,ChatGPT-4 model was not available through API-call. We tried it manually, and the results were better to the 3.5Turgo., that why you may see ChatGPT-4 source in source for some features descriptions.
We have now updated features descriptions with the openai chatgpt-4 model : on one hand the quality of answers is better but on the other hand the results was retrieved much slowly and the cost is 20 time more expensive (3$) than the chatgpt-3.5-turbo.
Even if the chatGP4 had improve descriptions and samples some answers or samples remain not true. And as i said in the introduction : a human revue should be done and even if i could manage to double check 352 features descriptions and samples, I think a community « application » should be designed to vote and store comment for each feature description and sample.
I you like the idea : contact us to build together this community application !